Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

TorgoX (1933)

TorgoX
  sburkeNO@SPAMcpan.org
http://search.cpan.org/~sburke/

"Il est beau comme la retractilité des serres des oiseaux rapaces [...] et surtout, comme la rencontre fortuite sur une table de dissection d'une machine à coudre et d'un parapluie !" -- Lautréamont

Journal of TorgoX (1933)

Saturday February 23, 2002
02:51 PM

HASSAN CHOP

[ #3073 ]
Dear Log,

Putting my corpus linguistics superpowers to use, I recently surveyed all the various Oriennal junk mail I've been getting lately, and decided that the most common strings to go on a killing spree for, are:

  • µÄ ("\xb5\xc4")
  • ±â ("\xb1\xe2")
  • [escape]$B ("\e\$B")

Those come up in email that's in Asian encodings even when not declared as such in the MIME headers.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • The first appears to have been chosen for "de" (roughly, "of") in Simplified Chinese (GB2312). The third looks like the JIS escape sequence which begins Japanese text in "JIS" encoding. But what is the second one? Is it supposed to be Korean "gi" in KSC 5601? And aren't you missing some typical Big5 character for Taiwanese stuff? (The second character would, in Big5, be a character which means "last day of the month or year", if I'm not mistaken, but that's probably not particularly common.)
    --

    -- 
    Esli epei eto cumprenan, shris soa Sfaha.
    Aettot ibrec epesecoth, spakhea scrifeteis.