Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Rewriting in hex notation makes those bytes a little more familiar:

    $ perl -we 'printf "%02x %02x %02x\n", 0357, 0273, 0277'
    ef bb bf

    If that’s still not familiar, Encode might help:

    $ perl -MEncode -we 'printf "U+%04X\n", ord decode_utf8("\xef\xbb\xbf")'
    U+FEFF

    And what’s U+FEFF?

    $ perl -MUnicode::UCD=charinfo -lwe 'print charinfo(0xFEFF)->{name}'
    ZERO WIDTH NO-BREAK SPACE

    It’s a zero-width non-breaking space, also known as a “byte-order mark”. At the start of a document, a zero-width non-breaking space has no visual effect, so it was originally intended to allow programmatic distinction of little-endian and big-endian 16-bit encodings of Unicode. (There’s guaranteed to be no Unicode character with the codepoint U+FFFE, so it’s safe to use it in that way.) Eventually the same technique got applied to UTF-8, despite the fact that it doesn’t typically provide any benefits under UTF-8, and is often actively harmful.

    So it seems that Hotmail’s server sometimes generates bounces that both are inappropriately in a non-US-ASCII encoding, and also inappropriately begin with a byte-order mark. This is what would be technically described as a bug.

    The Wikipedia article on byte-order marks [wikipedia.org] may be helpful.

    • Pah, you beat me to the punch. :-)

    • Sheesh, a UTF8 BOM marker, I'd never have thought in that direction.

      despite the fact that it doesn’t typically provide any benefits under UTF-8

      Well, it's a marker that the following text is in UTF8. So it may be somewhat useful, though of limited use.

      Of course, it is completely out of place in mail headers.

      But I'm still wondering about the extra newline in front of the mangled From/code> header. Did Hotmail put it there, or did an intermediate SMTP server see the mangled header, and separated it from the rest of the other, properly formed, headers above it?