Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

inkdroid (3294)

  (email not shown publicly)
AOL IM: inkdroid (Add Buddy, Send Message)
Yahoo! ID: summe_e (Add User, Send Message)
Jabber: inkdroid

inkdroid is a person, not a robot. however, inkdroid likes ink. inkdroid likes perl too.

Journal of inkdroid (3294)

Saturday June 14, 2003
04:07 PM


[ #12839 ]

I want two CPAN modules, which may or may not exist:


An XML::SAX handler that does what XML::Simple does. So if I have some XML like this:

    <foo a=1>

I would be able to do this:

    my $foo = XML::Filter::Simple();
    my $parser = XML::SAX::ParserFactory->parser( Handler => $foo );
    $parser->parse_string( $xml );

    ## same kind of data structure as XML::Simple
    print $foo->{ a };        # prints 1
    print $foo->{ bar }[0];    # prints baz
    print $foo->{ bar }[1];    # prints bar


I want to be able to keep up to date with the goings on of my congressman, senators, and I want Perl to help me.

use Politics::US::Senator;
use Politics::US::Bill;

my $senator = Politics::US::Senator->new( 'Durbin, Richard' );
foreach my $vote ( $senator->votes() ) {

     my $decision = $vote->decision();
     my $billNum = $vote->billNumber();
     my $bill = US::Politics::Bill->new( $billNumber);

          "Today Durbin cast a vote of $decision regarding $billNum\n",
          "The bill's title is: ", $bill->title(), "\n"
          "And here is the content of the bill:\n",

Between the Senate website, and Thomas and WWW::Mechanize this isn't so far fetched at all.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • XML::Simple can pretty much do that already. The difference is that in your example you ignore the return value from the parse and the hashref (or 'simple tree') ends up inside the handler object itself. But this is how you'd do it with XML::Simple:

        my $xs = XML::Simple->new(keyattr => {}, ...etc...);
        my $parser = XML::SAX::ParserFactory->parser( Handler => $xs );
        my $foo = $parser->parse_string( $xml );

        print $foo->{ a }; 

    • Woah, great! I guess I had an old version of XML::Simple lying around.
    • I would really like to have access to the 'simple tree' while I am filtering. The reason for this is that I am working with huge XML documents (of varying content), and I would like to be able to extract portions of it as the parse is proceeding. I imagine a Data::Dumper::Dumper call on the blessed object will reveal where the simple tree is being built up, so I can be bad and dig into the object myself. But I'd rather not do that. Any ideas?
      • You might want to look into XML::Filter::Dispatcher which I think can do what you want (ie: treat sections of the document as documents in their own right).

        Digging inside wouldn't actually help (and might harm your sanity) since all the handler does is accumulate the events in an XML::Parser Tree-style structure and then once the whole document has been parsed it uses the 'collaspe' method to convert it to a simple tree.

      • A little late... you can do this using XML::Twig: the latest version lets you use the simplify method on any element of the tree. simplify gives you the same structure as XML::Simple. And of course you can use it during the parsing, so you can deal with parts of the tree.


        #!/usr/bin/perl -w
        use strict;
        use XML::Twig;
        use YAML;

        XML::Twig->new( twig_roots => { foo => \&foo })
                 ->parse( \*DATA);

        sub foo
          { my( $t, $foo)= @_;
            my $data= $fo

  • Well, I emailed the people running GIA [] about whether they had an API, but they haven't written me back. I'm sure that if they do then a module encapsulating the easiest uses of it can't be far behind, and if there isn't an API, I can't see why there wouldn't be one soon.

    You are what you think.
    • Neat. Let me know if you hear anything. In the meantime I'm serious thinking of writing an interface to Thomas [] using a nice OO interface and WWW::Mechanize in the background. Interested? I'd like to come up with the API first, and work from there.
      • Thomas looks neat (and very tentacle-y, in the amount of information it provides...), but I don't know doodly about WWW::Mechanize yet.

        I'd definately be interested to hear about it -- I personally try to be a political agnostic, but it doesn't always work.


        You are what you think.