Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

jtrammell (6222)

Journal of jtrammell (6222)

Wednesday August 02, 2006
09:20 PM

Bianca's Pesto

Bianca's Pesto

Ingredients:

  • basil leaves, about 4 cups
  • salt, 1/4 tsp.
  • virgin olive oil, about 1 cup
  • 1 clove garlic
  • 1/2 cup parmesan, grated
  • 1/4 cup pine nuts

Wash and dry the basil leaves. Pack them firmly in the food processor. Add ingredients. Cream until stiff.

1 batch yields about 2 cups of pesto, or about 8 servings if you like it like I do.

Eat fresh pesto promptly. If you need to store it for more than a couple of days, freeze it; add a thin layer of olive oil to the top of the pesto to minimize freezer burn.

Tuesday August 01, 2006
11:04 AM

Regex for UTF-8 octets (from perlunicode)

From "perldoc perlunicode":

Code Points            1st Byte  2nd Byte  3rd Byte  4th Byte

U+0000..U+007F       00..7F
U+0080..U+07FF       C2..DF    80..BF
U+0800..U+0FFF       E0        A0..BF    80..BF
U+1000..U+CFFF       E1..EC    80..BF    80..BF
U+D000..U+D7FF       ED        80..9F    80..BF
U+D800..U+DFFF       ******* ill-formed *******
U+E000..U+FFFF       EE..EF    80..BF    80..BF
U+10000..U+3FFFF      F0        90..BF    80..BF    80..BF
U+40000..U+FFFFF      F1..F3    80..BF    80..BF    80..BF
U+100000..U+10FFFF     F4        80..8F    80..BF    80..BF

And the equivalent regex:

qr{
        (?:
                                                [\x00-\x7f]  #   U+0000 .. U+007F
        |
                                    [\xc2-\xdf] [\x80-\xbf]  #   U+0080 .. U+07FF
        |
                               \xe0 [\xa0-\xbf] [\x80-\xbf]  #   U+0800 .. U+0FFF
        |
                        [\xe1-\xec] [\x80-\xbf] [\x80-\xbf]  #   U+1000 .. U+CFFF
        |
                               \xed [\x80-\x9f] [\x80-\xbf]  #   U+D000 .. U+D7FF
        |
                        [\xee-\xef] [\x80-\xbf] [\x80-\xbf]  #   U+E000 .. U+FFFF
        |
                   \xf0 [\x90-\xbf] [\x80-\xbf] [\x80-\xbf]  #  U+10000 .. U+3FFFF
        |
            [\xf1-\xf3] [\x80-\xbf] [\x80-\xbf] [\x80-\xbf]  #  U+40000 .. U+FFFFF
        |
                   \xf4 [\x80-\x8f] [\x80-\xbf] [\x80-\xbf]  # U+100000 .. U+10FFFF
        )
}x;

This has proven useful as I search for errant Latin-1 characters embedded in some files.