Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • by ziggy (25) on 2004.10.19 11:29 (#35327) Journal
    On the one hand, if you're using XML, you want to parse one syntax and be done. The reality is that's more fantasy than fact.

    The problem you're trying to solve here is colloquially known as "parsing the atom". That is, the XML tagging suffices to mark up the structure of some blob of data, but doesn't scale down to parse the data fragments inside that blob. For example, consider a nice XML document that has an attribute or a text block that contains a date. Is 'Tue Oct 19 12:19:16 EDT 2004' a chunk of text like 'Fred Flintstone', or a parsable date? And if it is a parsable date, what rules do you use to parse it?

    What about ISBNs? URLs? LaTeX fragments? Quoted HTML? Perl code? English sentences? PNGs?

    On the one hand, it's a hard problem. You could solve it and use XML as your one and only syntax, but that's paying an extra 800% for an additional 2% of value to the programmer (who generally needs to solve this problem once and move on).

    But on the other hand, throwing a CSV island in the middle of a sea of XML is like bringing a bottle of Bud Lite when touring the Guinness brewery just so you have something to drink. It might be a logical and defensible choice in some universe, but not this one.