Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

2shortplanks (968)

2shortplanks
  (email not shown publicly)
http://2shortplanks.com/
AOL IM: trelane2sp (Add Buddy, Send Message)
Yahoo! ID: trelane2sp (Add User, Send Message)

Mark Fowler has never been the same since he was elected leader of the London Perl Mongers. The strain manifests itself mainly in releasing various [cpan.org] modules [cpan.org] to CPAN, giving talks [2shortplanks.com], and use of the Trelane nick on #london.pm for endless procrastination. Doctors are still seeking a cure.
Tuesday October 02, 2001
09:23 AM

P3P and CP

[ #848 ]
Yep, so I haven't written a journal entry in quite a while.

This is because I've been really busy doing some interesting things at work. Which I can talk about. Because it's Perl, and it's going to be open source. Yey!

I don't know how many of you reading this will have heard of P3P. P3P is a w3c working draft (found here) for an XML version of a site's privacy policy. This is designed in such a way as to enable a web browser to be able to go and grab the site's P3P policy first and then, based on what's within the policy, decide what other information to relay in the following interactions with that web site.

This is great. A site declares that it's going to use all your data to telemarket to you? Configure your browser not to send your email address then! Of course, they could always lie in their P3P policy, but in certain countries under certain laws that could potentially be construed as badness under the law (though I'm Not A Lawyer.) Anyway, P3P is a Good Thing. And we should be doing as much as possible to promote it.

As - even with our fatter pipes - throwing around a XML document with each request from a site is a bad idea bandwidth wise (especially if you're getting only one image from a site,) the P3P spec has something called a "Compact Policy." This is like a "geek code" version of the original XML document which is small enough to be encoded directly in the HTTP headers of each request. Of course the summary doesn't contain nearly as much info as the main policy - at the moment it only concerns itself with cookies.

IE6 has settings to deal with compact policies. If you go to your test IE6 machine and look at your privacy settings you'll see that by default IE is set to refuse cookies from third parties (people serving content that's not in the same domain as that in your URL bar) that don't have a compact policy. Wow! Everyone and their dog suddenly needs CP support.

This is where Perl steps in. From a random sample of sites listed as implementing P3P, a large number of them have problems with P3P and CP documents. In short, their CP often doesn't match their P3P document which it's supposed to have orginated from...whoops. If only there was some way to automagically convert from a P3P document (which is in a handy XML form) into a P3P document...

So I've started writing a handy tool to do just that (and that alone) from the command line. This is all implemented as a Perl Module that makes much use of XML::XPath to do the actual logic. It's surprisingly easy to do - XPath certainly is powerful.

Mainly this week I've been refactoring. I threw away my initial prototype and rewrote the documentation from scratch (so I know exactly what everything's supposed to do now.) I'm writing tests (boring, but useful) for each of the tags that go in a CP and then butchering code out of the prototype and reimplementing it in between the various sections of POD until the tests pass.

The biggest challenge I've had is deciding a name for this module. I've asked around and heard a lot of points of view. In the end I gave up and submitted a summary of the entire debate to modules@perl.org. I've still to hear back from them...I probably gave them all a headache.

That's enough of a break now...should be uncodeblind again...back to the editor.

To Be Continued...