The Distribution of tags is a difficult matter,
It isn't just one of your holiday games;
You may think at first I'm as mad as a hatter
When I tell you, a tag may be distributed in many different ways...
Ever since releasing the first version of HTML::TagCloud, I've been wondering about the distribution of tags. I mean, I had a hunch and square-rooted the tag count and it looked fairly pretty. But then I remembered Clay Shirky talking about power laws and the whole long tail hype. How are tags distributed? How might my module represent tag clouds in a better way?
There only so much wild assed guessing I can do, so I hacked up a quick script using Flickr::API to pick 42 random users from Flickr recent photos, crawl their entire photo collection and find their tag distribution. What do I mean? Well, in my photo collection "london" is used a lot more than, say, "bamboo". In this case I'm not interested in the tags themselves but the distribution of their counts. I produced a chart - each colour is a different Flickr user, with the tags on the left being their popularity by that user: the maximum count being 30 and minimum (and most common) clearly being 1. See that slope! Looks awfully power law to me. So I change my sqrt() to be a log() and everything looks much prettier.
While I was at it I incorporated a patch from Dean Wilson and stole an idea from O'Reilly Radar to bunch the tags closer together. In time honoured tradition, I present before and after screenshots of my recipe tags. Also, I hadn't really considered that people would use HTML::TagCloud with only a few tags - now it copes better with that case. Get HTML::TagCloud 0.32 from CPAN now!
obra and I have been wondering how we could represent more information with tag clouds. I reckon we can easily represent three dimensions: the font size, the text colour and the tag order. (Even though I strongly feel that tags, and menus, should be sorted alphabetically). Any other ideas? How much further can we push this boat?