Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • As I see, MarkovBlogger reads its own entries. Bug or feature ?
    • It should not be reading its own entries. I do filter MB entries. However, if other blogs quote MB, there's little I can do about that. It is very, very difficult to debug a particular entry to see what when wrong. I will have a look.

      • I think this snake is indeed eating its own tail.

        To debug this, I first grepped through all of the source material, which is organized into a subdirectories and files.

        A typical source file looks like this:

        «subject: OSCON - /$3(.*)$/

        I write this on Saturday. My last entry was Wednesday. Thursday is a happy blur now. I know I went to some good talks, but I can't remember them now, even on pain of death. What I do have a clear recollection of is going to the Mummy Cafe with Quinn, Danny [oblomovka.com], David Blank-Edelman, Gnat and Jenine. Like the queen's tomb in the Khufu's (neé Cheop's) Pyramid, this restaurant was small, subterranean and devoid of treasure. However, the food was vaguely greek/mediterranean and so was easy to consume. Although no alcohol was imbibed there, for some reason conversation at the table was preempted as we all watched the ice cube that was impaled by David and hung across his glass slowly melt away to the point of failure.

        At the time, this seemed very, very important. Maybe the food included some kind of "pharaoh's surprise."»

        I looked through all of these entries with this command:

        find . -exec 'grep' '-l' '\[MarkovBlogger\]' '{}' ';'

        Only my own MB entries were found. This argued strongly that the markov.pl isn't filtering the entries correctly. Here's the relevant section of code:

        for my $file (@ARGV) {

          while (<>) {
            if (/^subject:/) {
              s/^subject://;
              next if /^\s*\[MarkovBlogger\]/;
              fill_table(table   => $subject,
                         state   => $s_in,
                         line    => $_,
                         'keys'  => \@last_subject_keys
                        );
            } else {
              fill_table(table   => $body,
                         state   => $s_in,
                         line    => $_,
                         'keys'  => \@last_body_keys,
                        );
            }
          }
        }

        As you can see, the code appears to be looking for subject lines that contain the distinct sentential phrase. However, a closer reveals the ugly truth: only that subject line is skipped; the rest of the file is still processed!

        With the careful application of labels, the fix appears to be the following:

        FILE:
        for my $file (@ARGV) {

          while (<>) {
            if (/^subject:/) {
              s/^subject://;
              next FILE if /^\s*\[MarkovBlogger\]/;
              fill_table(table   => $subject,
                         state   => $s_in,
                         line    => $_,
                         'keys'  => \@last_subject_keys
                        );
            } else {
              fill_table(table   => $body,
                         state   => $s_in,
                         line    => $_,
                         'keys'  => \@last_body_keys,
                        );
            }
          }
        }

        This sort of bug would never have happened in Java, because I would never have tried to write this in Java. :-)

        • FILE:
          for my $file (@ARGV) {

            if (open my $in, $file) {
              while (<$in>) {
                if (/^subject:/) {
                  s/^subject://;

                  if (/^\s*\[MarkovBlogger\]/) {
                    close $in;
                    next FILE;
                  }

                  fill_table(table   => $subject,
                             state