//...
Element sp;
for (sp = theStack; sp != null; sp = sp.next()) {
if (sp.name().equals(name)) break;
}
if (sp == null) return;// unknown etag, do nothing
//....
Here is the same operation, expressed more naturally in Perl:
## This end tag closes a tag that isn't open. Ignore it.
return unless grep m/$name/, @$stack;
Granted, these are philosophical and sylistic differences. My quibbles could be with the language, the generally accepted idioms for Java programming, or with the author of this code. (Actually, I don't have any issues with the author; just being complete and highlighting the possibilities.
Regardless of my personal differences, this example highlights the benefit of writing clear, concise code. Using the C-style for loop obscures the intent by micromanaging the problem and focusing on the mechanics. The single statement Perl equivalent nearly hides the mechanics while emphasizing the intent (return unless you find something).
You may look at this and say So what? It's just a different style preference. In the small, you're somewhat correct. However, I've been looking at this code for some time now, and these little differences accumulate to complexify a program from something that should be easy, and turning it into something hard to express and hard to understand.
Not the best way to write something that people should understand, and only incidentally for a computer to execute...
PS: Here's the tagsoup algorithm in a nutshell:
The real power is in the last bit: the algorithm to add extra events to produce well-formed output. Here's how that works:
The algorithm is somewhat simple, can always succeed (or just ignore a tag in woefully broken HTML), and work in a streaming fashion. Unfortunately, this streaming algorithm has no lookahead, so it will deform your input file to create something vaguely approximating your input in a well formed fashion, even if it does not render correctly. For example, a sequence of table, tr, form, td will result a sequence of table, tr,
every loop a method (Score:2)
In one of his talks (Enterprise Perl?) James Duncan discussed readable code and gives the excellent advice that every loop should be a method. I find myself doing this more with Java than Perl, probably because I hit the mental ceiling for method length with Java's verbosity. So I'd typically translate your example to a method like:
One side-effect of Java's not having unless is that I tend to write both 'isSomething' and 'isNotSomething' for readability, especially because an
Whitespace (Score:2)
vs
And they literally didn't see the difference. It's quite depressing.
-Dom
Re:Whitespace (Score:2)
@JAPH = qw(Hacker Perl Another Just);
print reverse @JAPH;
Hash (Score:2)
Please elaborate (Score:2)
Do you mean to say that if you find a <LI> tag without an enclosing <UL> or <OL> , you insert one of those?
Re:Please elaborate (Score:2)
If <LI> cannot fit within the current open tag (such as <A>), walk up the stack of open tags until you can find a spot where you can open it. If, for example, you find an open <OL>, close all open tags until the <OL> is at the top of the stack. (Presumably, that means you forgot a </LI> somewhere, since they cannot nest.)
If there is no spot in the stack where you can deform it to open a <LI> tag, try to open the sequence <OL> <LI>, and find the cl
Re:Please elaborate (Score:2)
So, where do you get your hierarchy of allowable nesting of tags from? The data will probably originate in the HTML DTD, but how do you feed it into your code? What form does the data structure take?
Re:Please elaborate (Score:2)
John Cowan wrote two schema langagues for tag soup - one for the scanner, and one for the tag parser. He has an XSLT stylesheet that converts the HTML Schema into a Java class. I wrote a simpler stylesheet that converts this schema into a hash-of-hashes. ;-)
wild (Score:2)
Re:wild (Score:2)
My "tag soup schema -> perl" stylesheet can be found at http://www.panix.com/~ziggy/schema.xsl [panix.com].
Perl 6; also, Wither the code? (Score:1)
In Perl 5.10 we get the smartmatch, so the mechanics will be completely hidden:
In any case, did you ever finish porting the thing to Perl? If not, can I still have the code you have so far?