Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • A very good list of truisms. However, there is the subtlest of subtle flaws in this list -- all categorical statements (including this one) are false. For example:

    Don't parse XML with regular expressions.

    Sometimes you do want to parse XML with regexes, but only in the most controlled of circumstances. Usually this involves munging huge quantities of data that are very rigidly formatted. If you can fully control the structure of XML inputs, and you tend to be reading inputs line-by-line (or bloc

    • It is possible to parse arbitrary XML with regular expressions. However, it can't be done line-by-line because tags can contain newlines. It must be done on the whole file (or have some smart buffering).

      There is a paper, http://www.cs.sfu.ca/~cameron/REX.html [cs.sfu.ca], which develops the regex for parsing XML.

      • Note that those patterns parse simple XML, not XML with namespaces. Parsing XML with namespaces purely using pattern matching is probably possible too, but it’d be a whole hell of a lot harder, and the patterns would be nasty monstrosities far more so than the managable beasts from that paper.

        • They will parse XML with namespaces. But they only break the XML into pieces; tags, comments, text, etc. They don't handle the pieces like breaking tags into names and attribute values. They don't handle resolving namespace prefixes into canonical names.

          I suspect they aren't suitable for doing interesting operations. They could be used for stuff that works on the chunks, like removing comments.