Recently, I had the bright idea of writing a C library for id3 v2.4 tags. So I strolled to www.id3.org to get the informal v2.4.0 standard. The first thing I noticed is that the standard has been splitted into two documents. One for the main structure of v2.4.0 and one for a description of each tag (there are 85-ish as of now).
I really wonder what kind of a lunatic you have to be to come up with such a draft. This must be the XML of tagging and apparently those people didn't in the least care about the implementation side of such a standard (another parallel to XML there). That might explain why so far there is no software implementation of id3v2.4. This is quite conclusive considering that v2.4.0 has been released in 2000.
According to this standard, the total size of such a tag (a tag means all the available frames combined) may add up to 256MB in size. This may sound like a lot, but it isn't. I am sure that this size can easily be reached when some of the more voluminous are used.
From a technical point of view, there are some real oddities in this standard. For instance, it uses the concept of
synchsafe integers. A 4-byte integer is in fact not 32bit, but only 28bit because the most significant bit of each of the four words is always zero. This is a bad idea, because it means that a CRC-32 checksum (which could show up in the extended header where present) is encoded in 5-bytes with 35 bits used. Too bad, my machine doesn't have 5-byte integers so you have to do explicit en- and decoding.
Another thing is the placement of frames. id3v2 tags used to be prepended to the file. Now, they can be everywhere (yes, even embedded in the stream of audio data which the below paragraph neglects to mention):
5. Tag location
The default location of an ID3v2 tag is prepended to the audio so
that players can benefit from the information when the data is
streamed. It is however possible to append the tag, or make a
prepend/append combination. When deciding upon where an unembedded
tag should be located, the following order of preference SHOULD be
1. Prepend the tag.
2. Prepend a tag with all vital information and add a second tag at
the end of the file, before tags from other tagging systems. The
first tag is required to have a SEEK frame.
3. Add a tag at the end of the file, before tags from other tagging
In case 2 and 3 the tag can simply be appended if no other known tags
are present. The suggested method to find ID3v2 tags are:
1. Look for a prepended tag using the pattern found in section 3.1.
2. If a SEEK frame was found, use its values to guide further
3. Look for a tag footer, scanning from the back of the file.
For every new tag that is found, the old tag should be discarded
unless the update flag in the extended header (section 3.2) is set.
I wonder why these id3v2.4 tags are still called tags. It would have been much more plausible to standardize a new filetype
id3 with one tag:
ATAU for attached audio data. Then we'd have id3-players.