While preparing a data extract I recently discovered a set of Discogs data which I've begun calling "special entities". These are artists or labels which aren't actually artists or labels, but are intended to work as a meta-entity for grouping or classification purposes.

Why is it worth calling these out? Well, depending on the way you are using the Discogs database or API, it may cause performance issues if you link to these artists or labels with little benefit; some are huge in the number of entities they link to.

Special artists

There are a set of data entries filed as "artists" which aren't really musical artists.

Folk and Traditional are both "artists" which don't refer to any one artist, but is intended to be a catch all reference to where musical compositions, arrangements etc have been passed down the generations.

There are also some copyright or legal based artists such as Derechos Reservados (All Rights Reserved would be a roundabout English translation) and numerous Public Domain designations.

Perhaps most annoyingly, an actual artist called Public Domain has been re-used to designate a lot of public domain recordings and releases. This mixed-use means that it's difficult to separate the genuine music by Public Domain from the other releases.

Finally there are catch-all artists, such as Anonymous and localised variants. Fortunately, No Artist and Unknown Artist are at least placeholders, with no real links in the database to cause problems.

Special labels

Perhaps the biggest performance issue I encountered when generating Discogs dumps was with Not On Label. This links to an enormous number of releases, all which are, of course, not designated to have been released by a record label.

It goes further than that; a number of popular artists also have special labels named with "Not On Label", for example Not On Label (Depeche Mode) which lists all of Depeche Mode's releases not on a record label.


What this all means is you have to be careful when blindly working with the Discogs data set! Any more of these? Let me know in the comments and I'll update the post.

Thanks to Exile on Ontario St who made the the image above available for sharing.
comments powered by Disqus
© 2016 elsten software limited, Unit 4934, PO Box 6945, London, W1A 6US, UK | terms and conditions | privacy policy