Amongst the well-known sources of music metadata online, Wikipedia is often forgotten. MusicBrainz and Discogs are the free sources that seem to get most of the attention, but it is little discussed that Wikipedia contains several hundred thousand metadata records, mostly for mainstream releases.

Not only that, but the metadata provided by Wikipedia includes album artwork, which means it's a possible source for people interested in finding and downloading cover art.

That's why OneMusicAPI includes Wikipedia music metadata in our database!

Why isn't Wikipedia discussed more?

I suppose, given that Wikipedia has only about 10% of the releases of the likes of Discogs, it's not surprising it is not the first choice for developers writing music apps. Nevertheless there are still advantages to using it.

It covers a wide range of mainstream releases, providing cover art for many of them. The cover art tends to be small, but this might be of use for certain apps for mobile devices, for example.

Being a separate source of data it also helps with app availability. Availability problems with MusicBrainz and Discogs is not exactly unknown, so having a fall back can be valuable.

Perhaps the reason it is not used more is that the integration is a bit more involved than other APIs. Wikipedia does not offer a music oriented API. Instead, you work at the document level. Once you have found the Wikipedia document that refers to the release you are seeking, you must then parse the Wikipedia markup to find the information you seek.

Parsing the unstructured information in the Wikipedia document is not impossible but it is prone to maintenance challenges. It turns out that there's another source of Wikipedia data which does all this work for you: DBpedia.

DBPedia is a project which extracts structured information from Wikipedia and presents it via a few different linked data mechanisms (for example, RDF triples). This makes it easier to use to look up the data you are interested in. The steps for finding album artwork on Wikipedia become:

  1. Use DBpedia to find the release you are interested in, and within the release record the name of the cover art resource
  2. Use the Wikipedia API to download the cover art resource

So we still use the Wikipedia API, but only for the final step in downloading the cover art resource.

Let's look at these steps in more detail.

Step one: find your release using DBPedia

DBpedia's data is actually available in two main ways: online via their SPARQL interface, or via their data dumps. To ensure availability, OneMusicAPI uses the latter, but to get started quickly you may want to try the former. I'll cover online access here.

DBpedia expose their SPARQL endpoint at http://dbpedia.org/sparql. At this URL you can test your SPARQL query in the browser and to this URL you can send HTTP requests programmatically to execute your SPARQL queries and retrieve results.

A discussion of SPARQL is beyond the scope of this post, so let's just introduce a SPARQL query, I'll say what it does, and I'll leave it to you to refine this as an exercise for the reader... ok?

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbpedia2: <http://dbpedia.org/property/>
PREFIX owl: <http://dbpedia.org/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?name, ?coverArtVar WHERE {
	?subject dbpedia2:name ?name .
	?subject rdfs:label ?label .
	{ ?subject dbpedia2:artist ?artist } UNION { ?subject owl:artist ?artist }
	{ ?artist rdfs:label "The Maccabees"@en } UNION { ?artist dbpedia2:name "The Maccabees"@en }
	?subject rdf:type <http://dbpedia.org/ontology/Album> .
	?subject dbpedia2:cover ?coverArtVar .
}
Limit 10

This query selects all albums by The Maccabees and lists the name of the album and the cover art file stored on Wikipedia. Here's the current output (before the band's predicted 2014 album):

name				coverArtVar
"Wall of Arms"@en		"maccs.jpg"@en
"Given to the Wild"@en		"the-maccabees-given-to-the-wild.jpg"@en
"Colour It In"@en		"ColourItIn.jpg"@en

Each of their current three albums has the cover art file recorded. We can then use these cover art file names to lookup the album artwork on Wikipedia.

Of course, the exact mechanism by which you pass the SPARQL query to DBpedia may differ depending on your programming language and the availability of any tools. A cross platform option is to send a HTTP GET and parse the resulting XML:

http://dbpedia.org/sparql?format=XML&default-graph-uri=http%3A%2F%2Fdbpedia.org&query=[query here]

Step two: use the Wikipedia API to download the artwork

Wikipedia can then be called to download the cover art. Take the cover art filename and append it to:

http://en.wikipedia.org/w/api.php?format=xml&action=query&prop=imageinfo&iiprop=url|size&titles=File:[filename]

For the Given to the Wild cover, this results in:

<api>
    <query-continue>
        <imageinfo iistart="2014-07-19T18:03:55Z"/>
    </query-continue>
    <query>
        <normalized>
            <n from="File:the-maccabees-given-to-the-wild.jpg" to="File:The-maccabees-given-to-the-wild.jpg"/>
        </normalized>
        <pages>
            <page pageid="34369601" ns="6" title="File:The-maccabees-given-to-the-wild.jpg" imagerepository="local">
                <imageinfo>
                    <ii size="17267" width="300" height="300" url="http://upload.wikimedia.org/wikipedia/en/b/b6/The-maccabees-given-to-the-wild.jpg" descriptionurl="http://en.wikipedia.org/wiki/File:The-maccabees-given-to-the-wild.jpg"/>
                </imageinfo>
            </page>
        </pages>
    </query>
</api>

The URL from which the actual image can be downloaded is then given by the url attribute on any resulting <imageinfo>/<ii> elements.

So there we have it, just two steps to cross platform access to Wikipedia album artwork. Happy querying!

Thanks to Darwin Bell who made the the image above available for sharing.
comments powered by Disqus
© 2012-2024 elsten software limited, Unit 4934, PO Box 6945, London, W1A 6US, UK | terms and conditions | privacy policy