NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.
All the Perl that's Practical to Extract and Report
Stories, comments, journals, and other submissions on use Perl; are Copyright 1998-2006, their respective owners.
Please elaborate (Score:2)
Do you mean to say that if you find a <LI> tag without an enclosing <UL> or <OL> , you insert one of those?
Reply to This
Re:Please elaborate (Score:2)
If <LI> cannot fit within the current open tag (such as <A>), walk up the stack of open tags until you can find a spot where you can open it. If, for example, you find an open <OL>, close all open tags until the <OL> is at the top of the stack. (Presumably, that means you forgot a </LI> somewhere, since they cannot nest.)
If there is no spot in the stack where you can deform it to open a <LI> tag, try to open the sequence <OL> <LI>, and find the cl
Re:Please elaborate (Score:2)
So, where do you get your hierarchy of allowable nesting of tags from? The data will probably originate in the HTML DTD, but how do you feed it into your code? What form does the data structure take?
Re:Please elaborate (Score:2)
John Cowan wrote two schema langagues for tag soup - one for the scanner, and one for the tag parser. He has an XSLT stylesheet that converts the HTML Schema into a Java class. I wrote a simpler stylesheet that converts this schema into a hash-of-hashes. ;-)
wild (Score:2)
Re:wild (Score:2)
My "tag soup schema -> perl" stylesheet can be found at http://www.panix.com/~ziggy/schema.xsl [panix.com].