Lately I've been working on a script to parse and classify mails that come in after a bulk mail has been sent, most of them in the form of bounces. Of the roughly 2000 mails, 2 had corrupt headers. Guess where they both originated from? Oh, yeah, I already told you in the post title: hotmail.com.
The problem with these two mails is that in the middle of the mail headers, there's a blank line, followed by a line starting with "From: ", thus, apart from the 3 garbage characters, it's the real "From:" line.
You would expect that a huge company under the umbrella would be capable of getting their stuff right. I think it's quite typical that they don't. Can't. Won't.
Can anybody explain what the origin of this garbage could be? I have no idea. In Perl, you can match it with/\357\273\277/.
Origin of those bytes (Score:1)
Rewriting in hex notation makes those bytes a little more familiar:
If that’s still not familiar, Encode might help:
And what’s U+FEFF?
It’s a zero-width non-breaking space, also known as a “byte-order mark”. At the start of a docu
Re: (Score:1)
Pah, you beat me to the punch. :-)
Re: (Score:2)
despite the fact that it doesn’t typically provide any benefits under UTF-8
Well, it's a marker that the following text is in UTF8. So it may be somewhat useful, though of limited use.
Of course, it is completely out of place in mail headers.
But I'm still wondering about the extra newline in front of the mangled
From/code> header. Did Hotmail put it there, or did an intermediate SMTP server see the mangled header, and separated it from the rest of the other, properly formed, headers above it?