Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • not well-formed XML

    I thought that there is only well-formed XML. Anything that is not, is simply not XML. The intent being to avoid the tag soup and Do-What-I-Think-You-Meant heuristics that got us to the HTML we have today.

    Hence it sounds like even this so-called "RDF" that they are producing is fundamentally broken, if RDF is XML, and XML is well-formed. Not that this helps you, of course :-(

    • The RDF is well-formed, it's the Web site which is not. The RDF was very confusing, though, and I simply don't know it well enough to to manually use an XML parser to get all of the data I need.

      • Extracting information from RDF/XML with an XML parser is a fool’s errand. RDF is a graph model, and RDF/XML is merely one (fairly TMTOWTDI-heavy) representation of it. It is possible to design XML documents so that they can also be parsed as RDF, but if the data was modelled in RDF with no consideration given to the XML parsing case, then trying to parse its RDF/XML representation is likely to produce code more analogous to a regex-based HTML scraper than a parser.