In the spirit of 'publish what you discover' I decided to make a record of a regular expression I recently used for analysing incoming MP4 atom fields. Specifically, breaking down the names of these fields to do something useful with the contents.
MP4s allow a variety of different metadata
fields and most of them use a four character code to identify what lies within. For example,
@alb
for the album name,
trkn
for the track number. On the other hand, some fields have much more long winded codes:
----:com.apple.iTunes:iTunNORM ----:com.apple.iTunes:iTunes_CDDB_IDs ----:com.nullsoft.winamp:publisher
These are known as reverse DNS style fields. They are identified by the first ----
characters, a reverse DNS domain and finally the field name itself, each part separated by colons.
These are generally less frequently used fields, but present nontheless, so any MP3 tag editor programmer has to deal with them.
The JAudioTagger API I use in bliss to read and write tags doesn't provide a way of writing a tag field with a plain text reverse DNS style name. Instead, an MP4FieldKey must be supplied, but as this is a Java enum I have to choose which one of the MP4FieldKeys is intended to be written. So I had to write a piece of code to look through all the MP4FieldKeys by examining the domain (called 'issuer') and field name ('descriptor') and choosing the one that matches. To extract the issuer and descriptor, I used this regex:
----:([a-zA-Z0-9\-\.]+):(.[^:]+)
This matches the first four characters, then the issuer (the reverse DNS domain), then finally the descriptor. The issuer and descriptor are stored in captured groups for retrieval by your code. It doesn't allow for escaping of descriptor names, although I'm not even sure that is permitted. Adding it should be simple with a lookbehind.
Thanks to Lasse Havelund for the image above.