Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

jozef (8299)

  (email not shown publicly)

Journal of jozef (8299)

Wednesday January 27, 2010
03:56 PM

Illegal character 0x1FFFF

$ perl -le 'use warnings; my $x=chr(0x1FFFF)'
Unicode character 0x1ffff is illegal at -e line 1.

XML supports UTF-8 so I check for valid UTF-8 string and use it in XML if valid. Right? No!!!

There are some "non-illegal" characters that are perfect valid in UTF-8 (or even in the plain old ASCII), but are invalid for XML. The most obvious 0x00. Here is what W3C XML 1.0 specification say:

[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

I spend some time playing with it and the result is XML::Char->valid(). The dev Data::asXML is using it now. If you you want, have a look at the test suit and try to break it. :-)

Thursday January 21, 2010
12:37 PM

My first Perl6 regexp grammar in Perl

Once I told potyl - "hey let's have a wiki it will be easy to use for everyone". He wasn't so excited, not at all. Why? What is so bad about wiki? Look at this table of 130+ wiki syntaxes. Anyone still complaining that there are too many simmilar choices on CPAN? The wiki community decided to solve the problem by creating yet another wiki syntax...

What this has to do with Perl6 regexp grammars? After Damian talk at YAPC::EU::2009 I really wanted to try out the Regexp::Grammars. Finally I found some time during the Christmas and here is the result:

use Regexp::Grammars;
use re 'eval';
my $parser = qr@

<rule: Wiki> <[Block]>*
<rule: Block> <.EmptyLine> | <Code> | <Para>
<token: Para> <Heading> | <List> | <TextLines>
<token: EmptyLine> ^ \h* \R
<token: TextLines> (?:^ (?! <Code> | <Heading> | <List> | <EmptyLine> ) [^\h] .+? \v)+
<token: CodeStart> ^ {{{ \h* \v
<token: CodeEnd> ^ }}} \h* \v
<token: Code> <.CodeStart> <CodeLines> <.CodeEnd>
<token: CodeLines> .+?
<token: Heading> <HeadingStart> \s <HeadingText> \s =+ \h* \v
<token: HeadingStart> ^=+
<token: HeadingText> [^=\v]+
<token: List> <[ListItem]>+
<token: ListItem> ^ <ListItemSpaces> <ListItemType> \h+ <ListItemText> \v
<token: ListItemSpaces> \h+
<token: ListItemType> (\*|\d+\.|a\.|i\.)
<token: ListItemText> .+?

It is probably not the best piece of regexp gramar, Perl6 experts will for sure spot some error, but hey it works! "Works on my computer". I've used it to transform TracWiki syntax to XHTML div and then using XSL to DokuWiki syntax. Here are the scripts that does it completely.

Tuesday January 12, 2010
02:24 PM

we lost three men :-/

It is only the beginning of 2010 and we already lost 3 men. Who? Why?

  1. Emmanuel - a Perl guru now finally leaving Bratislava for in the Amsterdam
  2. Jonathan - a Rakudo (and many other programming or human spoken languages) guru leaving for Sweden
  3. Martin - network guru and manager that is programming in Perl, leaving for Sweden too

I would like officially say "Thank you!" to all three of them for what they have done and brought to the group.

Thursday January 07, 2010
04:24 PM

Wide screen? Let the search engines fight!

Google vs Cuil vs Yahoo vs ...

Wide screen offers a lot of horizontal space so why not to use it and search with two (or more) search engines at the same time? Try (about).

Where is the Perl there? There in the back. In the name of static can be more the files are pure HTML+CSS+JS generated using Template Toolkit ttree and some Makefile rules.

The page is basically an input box and 1 or more iframes with search engines. Submitting a text in the input box results in reloading search engines with a search url.

Why was the created? I got an idea how to do the web searches a little different way. But then I've realized - who the hell will try just another search engine??? Well no one of course :-), but if there will be a way to keep Google (or any other major search engine) next to the new one, may there will be a little chance... I will probably never have enough time, money, energy and the rest one needs to start and finish such a project, but at least now there is a way how to compare how good the current search engines are - side by side.

The few days that I'm playing and searching with multiple search engines at once made me realize that, yes - the search engines are different. And yes - the other (than Google in my case) search engines can give better results sometimes. :-)

Monday January 04, 2010
02:44 PM

I feel like no one ever told the truth to me...

I feel like no one ever told the truth to me
About growing up and what a struggle it would be
In my tangled state of mind,
I've been looking back to find where I went wrong.
-- Queen - Too Much Love Will Kill You

There are people that say reinventing the wheel is bad, that it shouldn't be done. That we should find an existing project and contribute there. Some even say that there are too many variants and choices of CPAN modules that are doing the same thing, and it is wrong. That it is contra productive and scary for the newbies. There are people that call them self CPAN police that hunt down new uploaders trying to show them how many mistakes they made...

Now look at the kids. Those experience deferred success 1000 times a day. Even if they don't fail they play most of the day. They play by repeating, copying from adults or other kids. They speak the same sentences wrongly over and over. They do the same things over and over. They fail over and over. They are just kids. Every one knows that this is how they learn. Next to the kids there is always some adult. Until the kids are really small, the adults seems like perfect to them, because they just do everything perfect.

After growing up, one day, kids finds out that the adults are not perfect. They don't do always the right thinks. And they don't know everything. The trouble is that there are many adults that think that they perfect are. But that is different topic. Let's go back to childhood.

To be precise the Perl programmer childhood. Perl programmer life. The difference is everyone is free to be born to the Perl world and grow up here. The other nice thing is that everyone is free to leave it and go and live a different life.

/me a Perl kid. I like to play, I like to try out things. I like to reinvent the wheel over and over. I don't mind that there are Perl grown-ups that do the same thing much better than I do. I don't mind that I will hit the ground while doing weird experiments. And? It's fun and everybody has to fall the first time. (and second and third and ...) And I'm just a kid!

The Perl world is different to the "reality". The biggest difference is that it's hard to see the age. Everyone is growing with a different rate and some will never grow up - like me :-P

So I would like encourage all kids to come play with toys, throw them away if they don't like them any more and not be ashamed that they "just" build "another" sandcastle. You can always destroy it and build another one, don't you?

Now back to the desert of the real. There is a plenty of legacy Perl code every where around us. Legacy sometimes mean undocumented, unmaintained, badly written or just not understood. Sending bad words and blaming people that wrote it will not fix the situation. Everyone is doing the best he can, considering his experience, mood, moon phase, weather conditions, ... at the time of writing the code. If the makes the job done, it is good. If it makes someone happy writing it, it is even better. And if there is no replacement, it is the best code ever!

Tuesday December 29, 2009
11:59 AM

Just put it where it belongs

Still there after reading this and this?. What I would like to show next is Module::Build::SysPath and the usage for CPAN authors. What are the features?

  • pre/post install paths (dev+test/prod)
  • system paths configured once when Sys::Path is installed
  • "Debian way" of handling configuration files
  • automatic all files copying
  • creating of empty folders (for state, log, etc. files)

The main reason for existence is to install files (configuration, templates, data files, ...) where FHS recommends and also to know where to put temporary files (like state, lock, cache, ...).

There are couple of stages that the CPAN distribution is going through. Development, smoke testing, pre install and install stage. In all of those it is good to know where our files are. :-P Before the final install the files are always inside the distribution folder, only after install the files are in the system folders. There for the Module::Build::SysPath requires a special module YouModuleName::SPc that gets overwritten with values from Sys::Path once installed.

There are, as always, couple of other ways how to store and work with non-perl modules files. Here are some that I'm aware of:

  • storing data inside __DATA__ in .pm file
  • storing files in the same folder as .pm files
  • storing files in auto/ - File::ShareDir
  • hardcoded system folders
  • use Sys::Path paths directly
  • ???

Everyone can choose.

Tuesday December 22, 2009
08:38 AM

use Config; # discover your path

Reasoning why knowing the system paths is a good idea can be found here. Now let's see how one can guess (get?) them in Perl.

There are a lot of Perl installation made to non-standard paths. Like:

  • /usr/local/bin/perl
  • /home/$USERNAME/local/bin/perl
  • C:\Strawberry\bin\perl
  • /opt/bin/perl

I was trying to find out what do this installations share and what could be used to find out if the Perl is in file hierarchy standard folder - /usr/bin/perl. The simplest way, that I found out, was to:

use Config;
if ($Config::Config{'prefix'} eq '/usr') { ... do stuff ... }

That is basically all Sys::Path does. If the installation prefix was set to "/usr" (in all Linux distributions standard Perl packages it is), than the rest of system paths is by default set to FHS. If not all of the paths are based on the prefix - $PREFIX/etc, etc.

Just check the Pod of Sys::Path for all the different paths and their usage.

too simple? but works :-)

Tuesday December 15, 2009
03:18 PM

Once uppon a time there was a sysadmin

Once upon a time I was a young system administrator. Having all the strange looking /usr, /var, /etc, ... all round me was scary and I was not sure what to "do" with all those folder trees. At some point I started to compile the extra programs that I needed. With a default prefix all ended up in /usr/local which looked safe to me. I knew that my stuff is there and the mysterious system stuff is everywhere else.

Well it worked. Having to maintain some more servers later I started to do some packaging. I was using Slackware at that time and the Slackware packaging system was really simple - just a tarball that got extracted to the root of the file system with some scripts that got executed during the package installation time. Simple and worked for me. Still I kept the stuff in /usr/local to be on the safe side.

And the time passed :-) and I'm using Debian now, but most important change is that I've lost all my respect to the file system and I've learned where and why to put files. Why to use /etc for configs, /var/log for logfiles, /var/lib for state files, /var/cache for cache files, /usr/share for templates or static data files, etc.? Simple because it is standard. Because it is standard, standard compliant distribution will stick to it. Besides being standard there are some really good reasons too. Helper tools will understand the files and then act based on the files category. Automatically rotating logs, backup-ing the important (non static distribution) files, cleaning up the temporary files etc. Well and there are also humans out there. Co-admins or newcomers, that will login to the machine and look for files or trouble shoot the programs. Knowing where to look for stuff really helps!

With today advance of virtualization techniques there is no reason to mix too many things (projects) on one server. So there is no reason to play safe with the paths and files should be put to the right place where they belong - FHS.

(to be continued with Perl part of the story...)

Friday December 11, 2009
05:52 AM

Installing CPAN module dependencies listed in META.yml

Have you ever had a Perl distribution extracted and wanted to install all it's dependencies afterwards? You can do it using:

cpan -i Test::Install::METArequires
perl -MTest::iMETAr -le 'Test::iMETAr->t'

The output will be TAP. One test per dependency. Is there some other (better) way to do it? Should be as always...

Thursday December 10, 2009
02:22 PM

Re: when virtualization is too expensive

Reply to two commanets from "when virtualization is too expensive".

1st was from grantm. Thanks for pointing out the OpenVZ. It's much more professional tool than debootstraping and chrooting, but comes with more configuration and setting-up complexity. It's possible to share memory using OpenVZ which is the resource that we never have enough.

2nd was from Alias. Alias is pointing out the deduplication of disk data. Disk space is kind of cheep these days, comparing to memory, bandwidth and cpu processing costs when dealing with huge data. The deduplication can offer, besides saving the disk space, a better cache efficiency, that will increase disk read speed, that is never fast enough. ZFS should soon (first quarter of 2010) support deduplication. It seems that there will be no native ZFS support for Linux because of the licencing problems :-( . Easy Linux deduplication that can come in handy is cowdancer. `cp -al some-folder/ some-folder.copy && cd some-folder.copy && cow-shell` will copy "some-folder" via creating hardlinks which is really fast. Than any write operation under this shell and folder will result in removing the hardlink and creating a new file => copy-on-write. This works good with chroot-ed systems to test new Perl module versions or do some other experiments.