Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Never failing with an error message is what the currently most heavily used parsers do: the parsers the feed the HTML rendering engines in our browsers. Their goal is exactly that: always render something, never bail out with a parsing error.

    • Huh - don't know why I didn't think of the browsers when I wrote that. The browser authors must have problems a hundred times thornier than did I.

      There's no question that it would be simpler from a programmer's perspective just to refuse to render if some set of grammar rules are not strictly followed. In most systems this is actually a requirement from a safety perspective, but in visual rendering like HTML or wiki markups, the temptation to forgive and forget is strong... especially when several browsers

      • No one ever thinks of the browsers. :-) The most successful computing platform ever, by a yawning margin, and paradoxically enough the most casually overlooked one by just as yawning a margin.

        As for guessing vs catching fire, the problem in case of the web is that the user who gets to see the error is the one least capable of fixing it. So I don’t see how browsers could avoid lax parsing, even in an ideal world where almost all markup was valid (as opposed to the real one, where something like 99.99% of markup is invalid).

        But for something like wiki markup where the author is close at hand, I would favour a mixed approach: use a forgiving parser during authoring that tries to make a sensible guess when it encounters an error, but then ask the user whether the guess is correct, providing a corrected version of the input that they can easily rubber-stamp. That way, you can ensure that data is always clean before you commit it to storage. The benefit is that you run no risk of diverging interpretations of that data when you pass it to different compliant parsers.

        Of course, this is even more complex than “just” writing a forgiving parser and not caring about clean data.