Quantcast
Channel: John's Weblog » dbpedia
Viewing all articles
Browse latest Browse all 3

All roads lead to? Experiments with Gephi, Linked Data and Wikipedia

$
0
0

Gephi is “an interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs”. Tony Hirst did a great blog post a while back showing how you could use Gephi together with DBpedia (a linked data version of Wikipedia) to map an influence network in the world of philosophy. Gephi offers a semantic web plugin which allows you to work with the web of linked data. I recommend you read Tony’s blog to get started with using that plugin with Gephi. I was interested to experiment with this plugin, and to look at what sort of geospatial visualisations could be possible.

If you want to follow all the steps in this post you will need to:

Initially I was interested to see if there were any interesting networks we might visualise between places. In order to see how Wikipedia relates one place to another was a simple case of going to the DBpedia SPARQL endpoint and trying the following query:

select distinct ?p
where
{
?s a <http://schema.org/Place> .
?o a <http://schema.org/Place> .
?s ?p ?o .
}

- where s and o are places, find me what ‘p’ relates them. I noticed two properties ‘http://dbpedia.org/ontology/routeStart‘ and ‘http://dbpedia.org/ontology/routeEnd‘ so I thought I would try to visualise how places round the world were linked by transport connections.  To find places connected by a transport link you want to find pairs ‘start’ and ‘end’ that are the route start and route end, respectively, of some transport link. You can do this with the following query:

select ?start ?end
where
{
?start a <http://schema.org/Place> .
?end a <http://schema.org/Place> .
?link <http://dbpedia.org/ontology/routeStart> ?start .
?link <http://dbpedia.org/ontology/routeEnd> ?end .
}

This gives a lot of data so I thought I would restrict the links to be only road links:

select ?start ?end
where
{?start a <http://schema.org/Place> .
?end a <http://schema.org/Place> .
?link <http://dbpedia.org/ontology/routeStart> ?start .
?link <http://dbpedia.org/ontology/routeEnd> ?end .
?link a <http://dbpedia.org/ontology/Road> . }

We are now ready to visualise this transport network in Gephi. Follow the steps in Tony’s blog to bring up the Semantic Web Importer. In the ‘driver’ tab make sure ‘Remote – SOAP endpoint’ is selected, and the EndPoint URL is http://dbpedia.org/sparql. In an analogous way to Tony’s blog we need to construct our graph so we can visualise it. To simply view the connections between places it would be enough to just add this query to the ‘Query’ tab:

construct {?start <http://foo.com/connectedTo> ?end}
where
{
?start a <http://schema.org/Place> .
?end a <http://schema.org/Place> .
?link <http://dbpedia.org/ontology/routeStart> ?start .
?link <http://dbpedia.org/ontology/routeEnd> ?end .
?link a <http://dbpedia.org/ontology/Road> .
}

However, as we want to visualise this in a geospatial context we need the lat and long of the start and end points so our construct query becomes a bit more complicated:

prefix gephi:<http://gephi.org/>
construct {
?start gephi:label ?labelstart .
?end gephi:label ?labelend .
?start gephi:lat ?minlat .
?start gephi:long ?minlong .
?end gephi:lat ?minlat2 .
?end gephi:long ?minlong2 .
?start <http://foo.com/connectedTo> ?end}
where
{
?start a <http://schema.org/Place> .
?end a <http://schema.org/Place> .
?link <http://dbpedia.org/ontology/routeStart> ?start .
?link <http://dbpedia.org/ontology/routeEnd> ?end .
?link a <http://dbpedia.org/ontology/Road> .
{select ?start (MIN(?lat) AS ?minlat) (MIN(?long) AS ?minlong) where {?start <http://www.w3.org/2003/01/geo/wgs84_pos#lat> ?lat . ?start <http://www.w3.org/2003/01/geo/wgs84_pos#long> ?long .} }
{select ?end (MIN(?lat2) AS ?minlat2) (MIN(?long2) AS ?minlong2) where {?end <http://www.w3.org/2003/01/geo/wgs84_pos#lat> ?lat2 . ?end <http://www.w3.org/2003/01/geo/wgs84_pos#long> ?long2 .} }
?start <http://www.w3.org/2000/01/rdf-schema#label> ?labelstart .
?end <http://www.w3.org/2000/01/rdf-schema#label> ?labelend .
FILTER (lang(?labelstart) = ‘en’)
FILTER (lang(?labelend) = ‘en’)
}

Note that query for the lat and long is a bit more complicated that it might be. This is because DBpedia data is quite messy, and many entities will have more than one lat/long pair. I used a subquery in SPARQL to pull out the minimum lat/long for all the pairs retrieved. Additionally I also retrieved the English labels for each of the start/end points.

Now copy/paste this construct query into the ‘Query’ tab on the Semantic Web Importer:

Screen Shot 2014-03-26 at 15.54.34

Now hit the run button and watch the data load.

To visual the data we need to do a bit more work. In Gephi click on the ‘Data Laboratory’ and you should now see your data table. Unfortunately all of the lats and longs have been imported as strings and we need to recast them as decimals. To do this click on the ‘More actions’ pull down menu and look for ‘Recast column’ and click it. In the ‘Recast manipulator’ window go to ‘column’ and select ‘lat(Node Table)’ from the pull down menu. Under ‘Convert to’ select ‘Double’ and click recast. Do the same for ‘long’.

Screen Shot 2014-03-26 at 16.01.19

when you are done click ‘ok’ and return to the ‘overview’ tab in Gephi. To see this data geospatially go to the layout panel and select ‘Geo Layout’. Change the latitude and longitude to your new recast variable names, and unclick ‘center’ (my graph kept vanishing with it selected). Experiment with the scale value:

Screen Shot 2014-03-26 at 16.09.49

You should now see something like this:

Screen Shot 2014-03-26 at 16.11.13

in your display panel (click image to view in higher resolution).

Given that this is supposed to be a road network you will find some oddities. This it seems to down to ‘European routes’ like European route E15 that link from Scotland down to Spain.



Viewing all articles
Browse latest Browse all 3

Latest Images

Trending Articles





Latest Images