Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • I thought I'd mention that this has, predictably, very little effect on normal messages, like those I used in previous profiling. These messages had 50 headers and 10,000 body lines:

    ~/code/pep/Email-Simple/trunk$ perl -I lib readmail big.msg     227@225624:1086
    just started                :   1364    28328
    after require File::Slurp   :   2232    28704
    after slurping         

  • Change this while ($head_txt =~ m/\G(.+?)$mycrlf/g) { to while ($head_txt =~ m/\G(.+?)$mycrlf/go) { I'm assuming the $mycrlf never changes, right?


    • I'm afraid that mycrlf does. It's the line ending that's used in the given email. There are two regex involved: $crlf and $mycrlf. The former matches any valid line ending, the latter matches those which are expected in this header.

      If I use $crlf, I should be safe, and that is constant. Switching to use that and enabling /o didn't really help, though.

      Here's the better thing, though. It seems that I was wrong in my belief that I couldn't use a /g pattern on a dereferenced string. I don't know why I co
      • In this instance, anyway, it did not. Perhaps it would if I tested under an older perl.

        MJD once gave me and/or the internet an explanation of when it helped, and it was a much smaller case than I had previously thought, so now I usually don't even think about it. :-/
  • The problem is that I apparently can't use a pattern with /g on a dereferenced scalar.

    You certainly can. I use this technique extensively in CAM::PDF to incrementally parse a PDF document into a DOM. I pass a scalar reference to the content from one sub to the next.

    Without delving too deep into your code, I believe the important bit that you are missing is the "c" flag on the regexp.
    • Well, I re-added my /g regexp (as noted in another comment) this morning, and it worked, even when matching against $$string. I did not add /c, though. I don't know what I did wrong yesterday!

      I didn't know about /c, though. It doesn't appear to be documented in perlre, although it's hard to search for one character things, sometimes. It does appear in perlreref.

      I'm glad to know about it, but it should not be an issue here: at no point should the regex fail and then need to match again.

  • is there any reason why before, when you have the hash and array you didnt just mess a little with the symbol table to effectively make them point to the same variables, almost cutting memory use in half (well there is still the sort order array...) without having to give up your initial strategy for easy lookups?
    • What do you suggest "the same variables" are that they should point to?
      • Sorry. I completely slipped, morning coffee not quite kicked in. my apologies.. i thought you wre doing something completely different
      • Is it possible to use one big structure, where each of the three structures you want are sub-parts of it?

        As you construct the second and third parts, you can reference the first copy:

        my %a;
        $a{b} = { c => 'd' };
        $a{e}[0] = $a{b};

        Then, dumping it out, you see:

        # Notice the self reference...
        $VAR1 = {
                  'e' => [
                             'c' => 'd'

        • I looked at doing this, but managing all the references seemed like a colossal pain, and would still use a fair amount of memory over storing one simple structure with less cached lookups.

          Alternatively, perhaps you could keep just one copy of the structure, and have methods to look up value different ways as-needed, without having whole other structures pre-built, if they if they might not be used.

          That's what I did, effectively. There is on structure, an array of pairs, and methods that let you do the normal things. You can say "give me the values with name Foo" and it does. It just uses a linear search.

          Since Email::Simple 2 will have a Header object with a known interface, a more memory-hungry but faster impleme