Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

miyagawa (1653)

miyagawa
  (email not shown publicly)
http://bulknews.vox.com/
AOL IM: bulknews (Add Buddy, Send Message)

Journal of miyagawa (1653)

Monday April 21, 2008
02:40 PM

YAPC::Asia 2008 talks announced

Based on the voting from attendees, we decided the 2nd round of accepted talks. Now we've got 53 talks and they all look so interesting! Go check the list on the schedule page.

We'll announce the program next week with "Personalized Schedule" functionality built on top of Act hopefully with this weekend Hackathon!

Tuesday April 15, 2008
06:00 PM

Act Hackathon planned next week

YAPC::Asia 2008 organizers would like to thank Eric Cholet, the author of ACT for the great conference organizing software that powers most of YAPCs and Perl Workshops.

To show the appreciation in the hacker's way, I'm flying to Paris, France next weekend (April 25-28) funded by YAPC::Asia possible profit, to work on Act feature enhancement.

We plan to work on these things because we want them for YAPC::Asia:

* OpenID provider support
* Better Japanese names display (i18n)
* Embed videos and slides (YouTube, Google Video, Slideshare etc.) in talks
* Personal Scheduling (Who is attending to which talks) like Sched.org or icalico
* Online check-in API (Who actually showed up when)
* Promotional code / coupon for discounted payments

We (at least, I) prioritize implementing these because the trip is funded by YAPC::Asia but if there's anything you think is missing for Act, I'd love to hear. Remote participation (#act on irc.perl.org during the weekend) would be welcome too!

Friday March 21, 2008
12:12 AM

YAPC::Asia 2008 talks announced

YAPC::Asia 2008 website got a redesign, along with the announcement of sponsors and the initial set of talks (currently 33 talks and more to come!).

We have Larry Wall and Michael Schwern as keynote speakers this year. Tickets will go on sale on March 25th Tuesday local time. There's been YAPC::Asia tradition that 300 tickets go sold out in a week, so don't miss it.

Wednesday February 20, 2008
05:28 AM

Three levels of Perl/Unicode understanding

(Editorial: Don't frontpage this post, editors. I write it down here to summarize my thought, wanting to get feedbacks from my trusted readers and NOT flame wars or another giant thread of utf-8 flag woes)

I can finally say I fully grok Unicode, UTF-8 flag and all that stuff in Perl just lately. Here are some analysis of how perl programmers understand Unicode and UTF-8 flag stuff.

(This post might need more code to demonsrate and visualize what I'm talking about, but I'd leave it as a homework for readers, or at least thing for me to do until YAPC::Asia if there's a demand for this talk :))

Level 1. "Take that annoying flag off, dude!"

They, typically web application developers, assume all data is encoded in utf-8. If they encounter some wacky garbaged characters (a.k.a Mojibake in Japanese) which they think is a perl bug, they just make an ad-hoc call of:

Encode::_utf8_off($stuff)

to take the utf-8 flag off and make sure all data is still in utf-8 by avoiding any possible latin-1-utf8 auto upgrades.

This is level 1. Unfortunately, this works okay, assuming their data is actually encoded only in utf-8 (like database is utf-8, web page is displayed in utf-8, the data sent from browsers is utf-8 etc.). Their app is still broken when they call things like length(), substr() or regular expression because the strings are not UTF-8 flagged and those functions don't work in Unicode semantics.

They can optionally use "use encoding 'utf-8'" or CPAN module encoding::warnings to avoid auto-upgrades at all, or catch such mistakes, or use Unicode::RecursiveDowngrade to turn off UTF-8 flag on complex data structure.

Level 2. "Unicode strings have UTF-8 flags. That's easy"

They make an extensive use of Encode module encode() and decode() to make sure all data in their app is UTF-8 flagged. Their app works really nice in Unicode semantics.

They sometimes need to deal with UTF-8 bytes in addition to UTF8-flagged strings. In that case, they use some hacky modules named ForceUTF8, or do things like

utf8::encode($_) if utf8::is_utf8($_)

to assume that "Unicode strings should have UTF-8 flagged, and those without the flag are assumed UTF-8 bytes."

This is Level 2. This is a straight upgrade from Level 1 and fixes some issues of Level 1 (string functions not working in Unicode semantics, etc.), but it's still too UTF-8 centric. They ignore why perl5 treats strings this way, and still hate SV Auto-upgrade.

To be honest I was thinking this way until, like early 2007. There's a couple of my modules on CPAN that accepts both UTF-8 flagged string and UTF-8 bytes, because I thought it'd be handy, but actually that breaks latin-1 strings if they're not utf-8 flagged, which is rare in UTF-8 centric web application development anyway, but still could happen.

I gradually have changed my mind when I talked about how JSON::Syck Unicode support is broken with Marc Lehmann, and when I read the tutorial by and attended to the Perl Unicode tutorial talk by Juerd Waalboer in YAPC::EU.

Level 3. "Don't bother UTF-8 flag"

They stop guessing if a variable is UTF-8 flagged or not. All they need to know is that a string is whether bytes or characters, by checking how a scalar variable is generated.

If it's bytes, use decode() to get Unicode strings. If it's characters, don't bother if it's UTF-8 flagged or not: if it's not flagged they'll be auto-upgraded thanks to Perl, so you don't need to know the internal representations.

So it's like a step back from Level 2. "Get back to the basic, and think why Perl 5 does this latin-1 to utf-8 auto upgrades."

If your function or module needs to accept strings that might be either characters or bytes, just provide 2 different functions, or some flag to explicitly set. Don't auto-decode bytes as utf-8 because that breaks latin-1 characters if they're not utf-8 flagged. Of course the caller of the module can call utf8::upgrade() to make sure, but it's just a pain and anti-perl5 way.

There's still a remaining problem with CPAN modules, though. Some modules return strings in some occasion and not otherwise. For instance, $c->req->param($foo) would return UTF-8 flagged string if Catalyst::Plugin::Unicode is loaded and bytes otherwise. And using utf8::is_utf8($_) here might cause bugs like described before.

Well, in C::P::Unicode example, actually not. using C::P::Unicode guarantees that parameters are all utf-8 flagged even if the characters contain latin-1 range characters. Not using the plugin guarantees the parametes are not flagged at all. So it's a different story.

(To be continued...)

Monday February 18, 2008
05:14 PM

Submit your talks to YAPC::Asia 2008

YAPC::Asia 2008 proposal deadline is 2/25, one week away. Submit your talk now. We welcome JavaScript related talks as well as anything Perl.
Friday February 01, 2008
06:11 PM

OSCON talk

Wondering what talk I should submit to OSCON (and other YAPCs this year too!).

The obvious choice is Web::Scraper since I haven't done this talk other than Europe and Japan, and I can make lots of updates till summer when I give an actual talk (We call it CDD -- Conference Driven Development)

Any suggestions?

Friday January 18, 2008
07:44 PM

URI::Find::UTF8 -- Fun with Safari users

URI-Find is a great module to extract URIs from an arbitrary text, but unfortunately, it doesn't work with non-ascii URLs that we often encounter when chatting with Safari users, such as: http://ja.wikipedia.org/wiki/メインページ

The reason why Safari users sometimes do this is that Safari shows the URI-decoded path in its location bar.

I hacked and uploaded URI::Find extension (subclass) URI::Find::UTF8 which can be a drop-in replacement for URI::Find, to extract URLs like this.

We have a subversion repository too, if you want to take a look and found a bug and patch the code.

Tuesday January 15, 2008
06:27 PM

ActiveSupport equivalent to Perl

UPDATE: The module was originally written using constant overloading, but it is a dangerous and gross hack, so I changed that to use autobox framework instead (wondering why I didn't try that at first!). I updated the post accordingly.

Rails has ActiveSupport, something to add funky methods to Ruby core object, to do fancy things like 2.months.ago to get Time duration object etc.

I found it pretty interesting and wondered if it's doable in Perl. Yes it is, with using autobox framework which I hope is going to be in core in perl 5.12, or using constant overloading like bigint.pm does.

So here you are: autobox::DateTime::Duration on CPAN and SVN repository if you can't wait CPAN mirrors updates. With this you can say:


use autobox;
use autobox::DateTime::Duration;

print 1->day->ago, "\n"; # 2008-01-14T23:25:53
print 2->minutes->from_now, "\n"; # 2008-01-15T23:28:20

and all methods implemented in ActiveSupport::CoreExt::Numeric::Time, including this crazy fortnight method. Since it's a standard DateTime::Duration object, you can also say this to save some typings:


my $now = DateTime->now;
my $dur = 3->hours + 2->minutes;
$now->add_duration($dur);

This might be a fun birthday gift for DateTime's 5th birthday :)

Thursday January 10, 2008
11:50 PM

Honolulu.pm

My friend Toru Hisai, who has joined us at Shibuya.pm tech meetings in Tokyo a lot, has recently moved to Honolulu, Hawaii and he's now trying to start a local Perl user group there: Honolulu.pm. Hawaii.pm appears to have been there for really a long time but it turns out the website is way outdated and the contact on the site is bouncing, so I suggested him to start his own.

This might be a significant step for us towards YAPC::Hawaii? hint, hint.

Wednesday November 28, 2007
03:15 AM

SF.pm lightning talk

So I went down to SF.pm meeting and gave two lightning talks about Web::Scraper and takesako-san's neat IMG tag hackery. These talks went well and other talks were interesting too. Photos uploaded to Flickr tagged sf.pm.