Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • Perl and XML (Score:5, Insightful)

    Your assertion does not accurately summarize my experiences with Perl and XML.

    First, lots of Perl programmers have embraced XML. There was a period of time when the only module for parsing xml was XML::Parser and a few half-finished attempts at doing something differently. Today, there are many polished alternatives for processing XML, including the interchangeable PerlSAX framework which mimics SAX in Java. In fact, some ideas crop up first in Perl (or rather in Barrie Slaymaker's head) before they ar

    • Re:Perl and XML (Score:4, Interesting)

      by barries (2159) on 2003.02.05 12:30 (#16739)
      It's a pain to screen scrape an HTML page with Perl, but it's more of a pain to do it in Java.

      Matt Sergeant [], AxKit []'s father, cooked up a neat approach to this: use libxml2 (via XML::LibXML []) to parse the HTML in html and recover modes, then apply normal XML tools to it. I've not tried it, but I'd like for you to be able to do that and use XML::Filter::Disparcher [] to pluck out strings from the resulting XML stream using rules like:

          'string( foo/p )' => sub { print "foo/p contains '", xvalue, "'\n" },

      Anyone that wants to try this, I'll help; it's a neat use case.

      I agree wholeheartedly that XML is being badly applied to many things (as in your bad grammers comment), and that it's also being misapplied to things where there are more appropriate technologies. I'm no fan of BXXP/BEEP or SOAP, for instance. (I may yet change my mind on BEEP, if the toolset supporting it makes it less impenetrable).

      - Barrie

      P.S. <blush/>. In reality, most of the ideas that crop up in my head have been disproven loooong ago. I rediscover the obvious, daily. It's like having intellectual altzheimer's, I meet same concepts anew each day.

      P.P.S. Anyone interested in Perl+XML should definitely check out Kip Hampton's Perl and XML articles on []. They range from the sublime [] to the sophisticated [].

      • Matt Sergeant, AxKit's father, cooked up a neat approach to this: use libxml2 (via XML::LibXML) to parse the HTML in html and recover modes, then apply normal XML tools to it.

        Matt's mentioned this on more than one occasion. I always thought that libxslt/xsltproc was "broken" in its support for parsing HTML. I don't know how I came to that conclusion, but it must have been based on an early release of libxslt.

        Anyway, later that day, on Matt's urging, I wrote a quick little XSLT stylesheet to grep ou