Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

acme (189)

acme
  (email not shown publicly)
http://www.astray.com/

Leon Brocard (aka acme) is an orange-loving Perl eurohacker with many varied contributions to the Perl community, including the GraphViz module on the CPAN. YAPC::Europe was all his fault. He is still looking for a Perl Monger group he can start which begins with the letter 'D'.

Journal of acme (189)

Monday August 22, 2005
06:02 PM

The distribution of tags

[ #26424 ]

The Distribution of tags is a difficult matter,
It isn't just one of your holiday games;
You may think at first I'm as mad as a hatter
When I tell you, a tag may be distributed in many different ways...

Ever since releasing the first version of HTML::TagCloud, I've been wondering about the distribution of tags. I mean, I had a hunch and square-rooted the tag count and it looked fairly pretty. But then I remembered Clay Shirky talking about power laws and the whole long tail hype. How are tags distributed? How might my module represent tag clouds in a better way?

There only so much wild assed guessing I can do, so I hacked up a quick script using Flickr::API to pick 42 random users from Flickr recent photos, crawl their entire photo collection and find their tag distribution. What do I mean? Well, in my photo collection "london" is used a lot more than, say, "bamboo". In this case I'm not interested in the tags themselves but the distribution of their counts. I produced a chart - each colour is a different Flickr user, with the tags on the left being their popularity by that user: the maximum count being 30 and minimum (and most common) clearly being 1. See that slope! Looks awfully power law to me. So I change my sqrt() to be a log() and everything looks much prettier.

While I was at it I incorporated a patch from Dean Wilson and stole an idea from O'Reilly Radar to bunch the tags closer together. In time honoured tradition, I present before and after screenshots of my recipe tags. Also, I hadn't really considered that people would use HTML::TagCloud with only a few tags - now it copes better with that case. Get HTML::TagCloud 0.32 from CPAN now!

obra and I have been wondering how we could represent more information with tag clouds. I reckon we can easily represent three dimensions: the font size, the text colour and the tag order. (Even though I strongly feel that tags, and menus, should be sorted alphabetically). Any other ideas? How much further can we push this boat?

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Any other ideas? How much further can we push this boat?

    You could represent the 'freshness' of a tag (how recently it's been assigned to new objects) by blinking the tag name more or less quickly.

    I'm not suggesting you should. Merely that you could :-)

  • Interestingly, I find the "before" more pleasant to look at. I think I _like_ the whitespace, but yay power law. With luck, I'll have more time to play with the tri-axis tagcloud this week.
    • Ah well, the nice thing about the CSS is that you can remove the squishing together with one easy step ;-)
  • > the font size, the text colour and the tag order

    How about splitting the colour into colour and intensity? Makes it a bit harder to read, but then again, you can't use all the possible text colours, either, since there has to be some contrast.

  • So, if we take the example of Flickr, we could have:

    a) Newness. Represented by the order of the tags
    b) Quantity of photos tagged with the tag. Represented by Size
    c) Interestingness, represented by intensity of the tag. Tags associated with photos that are on average more interesting are darker (or lighter, depending on your background colour)
    d) Popularity. Tags that are associated with photos that are viewed more than others are "warmer".

    So, for example, my tag argh [flickr.com] would be somewhere near the start of
  • The new layout, while looking better, is a little more confusing about what tag you're point at with your mouse. For example the t and p of the potato and pasta as seen here [2shortplanks.com]. Some sweet sweet mouseover action (which could be implemented in pure CSS) would be nice.
  • You could (alphabetically) order tags by themes in sub-clouds or something. Say you have photos (or whatever) with tags like London, Paris, Amsterdam, you could 'add' them to a category or theme. These are all cities, so you could make a subcloud of cities, and a subcloud of something like 'Friends' and "family'. You'd need to tell it which tags belong with which groups, and have the option of placing one tag in multiple groups and then you'd get really interesting clouds, I think.

    Using colour and intensity
  • You might want to check out Zipf's Law, which desccribes such rankings. :)