The Distribution of tags is a difficult matter,
It isn't just one of your holiday games;
You may think at first I'm as mad as a hatter
When I tell you, a tag may be distributed in many different ways...
Ever since releasing the first version of HTML::TagCloud, I've been wondering about the distribution of tags. I mean, I had a hunch and square-rooted the tag count and it looked fairly pretty. But then I remembered Clay Shirky talking about power laws and the whole long tail hype. How are tags distributed? How might my module represent tag clouds in a better way?
There only so much wild assed guessing I can do, so I hacked up a quick script using Flickr::API to pick 42 random users from Flickr recent photos, crawl their entire photo collection and find their tag distribution. What do I mean? Well, in my photo collection "london" is used a lot more than, say, "bamboo". In this case I'm not interested in the tags themselves but the distribution of their counts. I produced a chart - each colour is a different Flickr user, with the tags on the left being their popularity by that user: the maximum count being 30 and minimum (and most common) clearly being 1. See that slope! Looks awfully power law to me. So I change my sqrt() to be a log() and everything looks much prettier.
While I was at it I incorporated a patch from Dean Wilson and stole an idea from O'Reilly Radar to bunch the tags closer together. In time honoured tradition, I present before and after screenshots of my recipe tags. Also, I hadn't really considered that people would use HTML::TagCloud with only a few tags - now it copes better with that case. Get HTML::TagCloud 0.32 from CPAN now!
obra and I have been wondering how we could represent more information with tag clouds. I reckon we can easily represent three dimensions: the font size, the text colour and the tag order. (Even though I strongly feel that tags, and menus, should be sorted alphabetically). Any other ideas? How much further can we push this boat?
Use the temporal axis (Score:2)
You could represent the 'freshness' of a tag (how recently it's been assigned to new objects) by blinking the tag name more or less quickly.
I'm not suggesting you should. Merely that you could :-)
huh (Score:1)
Re:huh (Score:2)
light - medium - dark (Score:2)
How about splitting the colour into colour and intensity? Makes it a bit harder to read, but then again, you can't use all the possible text colours, either, since there has to be some contrast.
hmm. (Score:2)
a) Newness. Represented by the order of the tags
b) Quantity of photos tagged with the tag. Represented by Size
c) Interestingness, represented by intensity of the tag. Tags associated with photos that are on average more interesting are darker (or lighter, depending on your background colour)
d) Popularity. Tags that are associated with photos that are viewed more than others are "warmer".
So, for example, my tag argh [flickr.com] would be somewhere near the start of
CSS (Score:2)
themes/groups/categories (Score:1)
Using colour and intensity
Zipf's Law (Score:2)