Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

hanekomu (8123)

hanekomu
  (email not shown publicly)
http://hanekomu.at/blog/
AOL IM: hanekomu (Add Buddy, Send Message)

Go (Baduk) player and Perl hacker.

Journal of hanekomu (8123)

Sunday May 18, 2008
07:16 AM

use 箆; or: Distributions with Kanji on CPAN

[ #36459 ]

During the YAPC::Asia hackathon day 2, we discovered a kanji that we could use for Moose.pm: 箆. It even looks a bit like some thing with antlers. Here is the JEDict definition:

箆 [へら: HERA] spatula
箆鹿 [へらじか: HERAJIKA] (uk) moose, elk

So a moose is an animal with a spatula on its head? Larry said that 箆 could also mean "comb", so a moose is an animal with a comb on its head. But I digress.

I've made a simple proof-of-concept distribution. The main module's code basically is:

        use utf8;
        package 箆;
        1;

'perl Makefile.PL', 'make', 'make test' and 'make dist' all worked well. It produced a tarball: 箆.tar.gz. So far, so good.

Then I've tried to upload the tarball to cpan:

        $ cpan-upload-http 箆.tar.gz

That also gave no error message. So I waited for the emails from PAUSE. Here is the first one:

> Subject: Notification from PAUSE
>
> MARCEL (Marcel Grünauer == hanekomu (跳ね込む)) visited the PAUSE and
> requested an upload into his/her directory. The request used the
> following parameters:
>
> pause99_add_uri_upload            [箆-0.01.tar.gz]
> SUBMIT_pause99_add_uri_httpupload [ Upload this file from my disk ]
> pause99_add_uri_httpupload        [-0.01.tar.gz]
>
> The request is now entered into the database where the PAUSE daemon will
> pick it up as soon as possible (usually 1-2 minutes).
>
> During upload you can watch the logfile in
> https://pause.perl.org/pause/authenquery?ACTION=tail_logfile&pause99_tail_logfil e_1=5000.
>
> You'll be notified as soon as the upload has succeeded, and if the
> uploaded package contains modules, you'll get another notification from
> the indexer a little later (usually within 1 hour).
>
>
> Thanks for your contribution,
> --
> The PAUSE

Due to the good fortune of having kanji and kana in my CPAN display name (跳ね込む), I was sure that the email had the right encoding. But the tarball filename it returned was wrong: "箆-0.01.tar.gz". Well, let's see. Maybe it's just a problem with generating the email. Wait and see... Here is the second email:

> Subject: CPAN Upload: M/MA/MARCEL/-0.01.tar.gz
>
> The uploaded file
>
>    -0.01.tar.gz
>
> has entered CPAN as
>
>  file: $CPAN/authors/id/M/MA/MARCEL/-0.01.tar.gz
>  size: 24392 bytes
>   md5: 39ebd0b5ab4bd9acdb2a80c1c4a733dd
>
> No action is required on your part
> Request entered by: MARCEL (Marcel Grünauer == hanekomu (跳ね込む))
> Request entered on: Sun, 18 May 2008 11:00:42 GMT
> Request completed:  Sun, 18 May 2008 11:00:52 GMT
>
> Thanks,
> --
> paused, v996

Hm, doesn't look good either. Well, wait for the indexer report... Here it is:

> Subject: Failed: PAUSE indexer report MARCEL/-0.01.tar.gz
>
> The following report has been written by the PAUSE namespace indexer.
> Please contact modules@perl.org if there are any open questions.
>  Id: mldistwatch 1001 2008-05-15 05:34:01Z k
>
>               User: MARCEL (Marcel Gruenauer == hanekomu)
>  Distribution file: -0.01.tar.gz
>    Number of files: 21
>         *.pm files: 14
>             README: 箆-0.01/README
>           META.yml: 箆-0.01/META.yml
>  Timestamp of file: Sun May 18 11:00:51 2008 UTC
>   Time of this run: Sun May 18 11:02:23 2008 UTC
>
> No package statements could be
>                     found in the distro (maybe a script or
>                     documentation distribution?)
>
> __END__

Ah, so it could actually untar it because it found the README and META.yml files. But it seems it uses the wrong regex to find the package statements. Maybe it needs to read the file as utf8...

Just to be sure it wasn't a problem with cpan-upload-http, I also uploaded the file from the PAUSE web interface directly. Here is the response:

> The Perl Authors Upload Server
>
> Upload a file to CPAN
> Add a file for MARCEL
> File successfully copied to '/home/ftp/incoming/-0.01.tar.gz'
> Your filename has been altered as it contained characters besides the class [A-Za-z0-9_\-\.\@\+].
> DEBUG: your filename['箆-0.01.tar.gz'] corrected filename['-0.01.tar.gz'].

So this doesn't work either.

Dear CPAN maintainers, please fix it.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Fortunately CPAN filenames are 7 bit and only a few of the non \w filenames are allowed. As much I'd like to change that as much CPAN has to be the advocate of the limitations that the last poor soul in the chain has to fight with who wants to use CPAN.

    I'm sure you know how to give the tarball a different name. I'm not so sure that perl is able to deal with UTF-8 module names. So please bring your first UTF-8 module into the core. I'm looking forward to see how it fares.

    • Regarding UTF8 variable names, the core doesn't support them very well. See this item in perltodo.pod :

      =head2 Properly Unicode safe tokeniser and pads.

      The tokeniser isn't actually very UTF-8 clean. C is a hack -
      variable names are stored in stashes as raw bytes, without the utf-8 flag
      set. The pad API only takes a C pointer, so that's all bytes too. The
      tokeniser ignores the UTF-8-ness of C, or any SVs returned from
      source filters. All this could be fixed.
      • But package names work quite ok. I did a similar experiment at nearly the same time with Acme::Ãœnicöde. Locally on disc it just works fine. Once uploaded it is even installable via CPAN.pm by the dist name (eg. S/SC/SCHWIGON/acme-unicode/Acme-nicde-0.02.tar.gz). The filename problem is solveable anyhow during/after "make dist".

        The only remaining problem is getting the PAUSE indexer to use the META.yml instead trying his luck on the files and to generate that META.yml with some mor

  • Another name, besides 麋鹿, is é§é¹¿. That is, literally, 'camel deer.'

    Slaps forehead.

    Of course! What other translation explains Moose's power?

    But according to some, 麋 is the character for P%C3%A8re David's Deer [wikipedia.org] See this explanation on a Chinese hunting site [chinahunts.com].

    The Chinese Wikipedia [wikipedia.org] says it's because the shoulders and buttocks of the 2 animals are similar.

    There's another character [fileformat.info] that means the same thing. I can't