Metadata personalisation using music data pipes

One thing I've learnt building bliss is that my original desire for a tool to fix music library consistency was shared by others. As I have moved bliss's basic metadata lookup code into the cloud in the guise of OneMusicAPI, I'm now thinking about how other aspects of bliss can be integrated into OneMusicAPI, and how they can help OneMusicAPI's users.

bliss's focus on intra-library consolidation and consistency can be seen as a focus on personalisation. For example, "I want all my albums to fit into one of these genres" or "I want all track numbers reported to be left padded, if required, to three characters". If all of bliss's users want preferential control over their music libraries, it stands to reason that OneMusicAPI's users (or their users!) do too.

So the far off vision for OneMusicAPI may be MYMusicAPI (and yes, I have purchased that domain name!).

Personalisation examples

So personalisation here is all about the metadata you get back when querying OneMusicAPI. When considering metadata, such as release titles, track numbers or genre names, I tend to split preferences of how these are presented into two types: semantic and syntactic preferences.

Semantic preferences govern the meaning of the metadata, potentially altering metadata according to some heuristic. An example is genre. OneMusicAPI's data is sourced from a number of crowd-sourced databases, which means the entries for genres can vary wildly. Some are very general, some are very specific.

Semantic control over genre values would mean the results from OneMusicAPI for genre would fit into your requirements. Require that all genres reported are high level ("classical", "rock")? Fine.

Syntactic preferences dictate the way the actual character string data is presented. For example: track numbers. You could dictate that all track numbers are at least three digits long, left padded with zeroes if required.

For someone wanting to implement such a system, what's the best way of achieving this?

Music data pipes

I've always considered "pipes" to be a good model for thinking about music metadata.

Originally metadata is drawn from a given source, be it MusicBrainz, Discogs, Wikipedia or wherever. Then, a series of "pipes" (you could call these "transformers") act on the data. I might choose to only want high level genres returned, and they should all be capitalised.

For example, "Britpop" is returned as the genre for a release. This value is converted to "Pop/Rock", then a second pipe transforms that genre to upper case: "POP/ROCK"

This is a nice way to think about music metadata because each pipe is stateless. It's character string in, character string out. Obviously, the name is influenced by Unix pipes, but this is an undeniably powerful model.

For the future!

Thanks to Angie Harms who made the the image above available for sharing.