Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • Can you please explain, maybe show an example, of what you mean with step 3? It's just a mystery to me now.
    If this tag cannot be placed under any tags in the stack of open tags, look for a tag in the stack that contain this tag's nominal parent. Remember this tag, and repeat the process with this tag's parent.

    Do you mean to say that if you find a <LI> tag without an enclosing <UL> or <OL> , you insert one of those?

    • Right.

      If <LI> cannot fit within the current open tag (such as <A>), walk up the stack of open tags until you can find a spot where you can open it. If, for example, you find an open <OL>, close all open tags until the <OL> is at the top of the stack. (Presumably, that means you forgot a </LI> somewhere, since they cannot nest.)

      If there is no spot in the stack where you can deform it to open a <LI> tag, try to open the sequence <OL> <LI>, and find the closest spot that can handle an open <OL> tag. Repeat the process above until there's a spot to insert the <OL> <LI> sequence.

      This algorithm, though surprisingly simple, generally works to fill in the missing bits of tag soup. For example, when a bare <TD> is found, the sequence <TABLE> <TBODY> <TR> <TD>. It also helps to fix up the infamous <B> <I> </B> </I>.