Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

bart (450)

Journal of bart (450)

Monday May 14, 2007
04:32 PM

Corrupt headers in Hotmail's bounce mails

[ #33275 ]

Lately I've been working on a script to parse and classify mails that come in after a bulk mail has been sent, most of them in the form of bounces. Of the roughly 2000 mails, 2 had corrupt headers. Guess where they both originated from? Oh, yeah, I already told you in the post title: hotmail.com.

The problem with these two mails is that in the middle of the mail headers, there's a blank line, followed by a line starting with "From: ", thus, apart from the 3 garbage characters, it's the real "From:" line.

You would expect that a huge company under the umbrella would be capable of getting their stuff right. I think it's quite typical that they don't. Can't. Won't.

Can anybody explain what the origin of this garbage could be? I have no idea. In Perl, you can match it with /\357\273\277/.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Rewriting in hex notation makes those bytes a little more familiar:

    $ perl -we 'printf "%02x %02x %02x\n", 0357, 0273, 0277'
    ef bb bf

    If that’s still not familiar, Encode might help:

    $ perl -MEncode -we 'printf "U+%04X\n", ord decode_utf8("\xef\xbb\xbf")'
    U+FEFF

    And what’s U+FEFF?

    $ perl -MUnicode::UCD=charinfo -lwe 'print charinfo(0xFEFF)->{name}'
    ZERO WIDTH NO-BREAK SPACE

    It’s a zero-width non-breaking space, also known as a “byte-order mark”. At the start of a docu

    • Pah, you beat me to the punch. :-)

    • Sheesh, a UTF8 BOM marker, I'd never have thought in that direction.

      despite the fact that it doesn’t typically provide any benefits under UTF-8

      Well, it's a marker that the following text is in UTF8. So it may be somewhat useful, though of limited use.

      Of course, it is completely out of place in mail headers.

      But I'm still wondering about the extra newline in front of the mangled From/code> header. Did Hotmail put it there, or did an intermediate SMTP server see the mangled header, and separated it from the rest of the other, properly formed, headers above it?