FreeDB is a database of audio CDs, containing the CD's title, overall artist, year of release and details about each individual track. It is queryable in a few ways, most commonly using a CD's TOC (table of contents) which gleans a set of matching CDs with their FreeDB IDs. This ID can then be used to lookup the individual CD information.
There's no artwork in FreeDB, but it is possible to find cover art using a FreeDB ID via OneMusicAPI.
FreeDB's dataset remains the largest of the free databases. For that reason it is used by software and hardware that rip CDs, or simply want a fallback to find more musical metadata. Unfortunately its history means that it has emerged without the same level of data format strictness common in the alternative databases. An example of this is working with title strings, and ambiguities that arise.
When querying FreeDB for the data for an individual CD, here's some sample output [abridged]:
... # DISCID=1c029503 DTITLE=Sugababes / Freak like me DYEAR=2003 DGENRE=Pop/Funk TTITLE0=Freak like me [radio edit] TTITLE1=Freak like me [we dont give a damn mix] TTITLE2=Breathe easy EXTD= ...
FreeDB output is divided into a set of common fields that are repeated for each CD, DISCID
, DTITLE
,
TTITLE0
and others. The data about the disc is prefixed with a 'D', the data about individual tracks is prefixed
with a 'T' and suffixed with the track position (zero based).
So we can see there are two types of titles: one for the disc and those for the tracks, DTITLE
and TTITLEn
respectively.
It's when we begin to inspect the contents of title fields that we begin to see how ambiguities can be introduced. On the subject
of DTITLE
, the
FreeDB HOWTO states:
Technically, this may consist of any data, but by convention contains the artist and disc title (in that order) separated by a "/" with a single space on either side to separate it from the text. There may be other "/" characters in the DTITLE, but not with space on both sides, as that character sequence is exclusively reserved as delimiter of artist and disc title!
So in theory, we can parse the artist and release title by splitting the string by a " / "
and assigning the
values to the artist and the title respectively.
So much for theory! In practice, as a community maintained database, the dataset is not 100% perfect. In the wild we see several different variations of these formats.
DTITLE=Sugababes/Freak like me
This first one is easy to deal with. The problem is there's no space between the artist and the title. No problem; our parser just needs to be a little more lenient.
TTITLE1=Sugababes / Freak like me/we dont give a damn mix
A slightly different variation, this time in track titles. Again we can add lenience, but because there're spaces between the first slash we know that preceeding that is the artist, succeeding that is the track title Freak like me/we dont give a damn mix.
It's probably obvious where this is heading...
TTITLE1=Sugababes / Freak like me / we dont give a damn mix
What to do here? The fore-slash delimiter is essentially creating three fields and is illegal according to the HOWTO. The trouble is that it happens, albeit rarely, and so we have to deal with it.
With no formal way of automating the decision, this is a case where the question must be raised to the user as to how to assign parts of the title to the artist name or release name. It's a shame, but automation gone wrong is often worse than no automation at all.
Thanks to wise.adam who made the the image above available for sharing.