Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Burak (3156)

Burak
  (email not shown publicly)
http://www.burakgursoy.com/

Journal of Burak (3156)

Wednesday September 09, 2009
10:24 PM

reftype() lazziness

I was doing a lot of foo() if ref($thing) eq 'ARRAY' or foo() if ref($thing) eq 'HASH' (even something idiotic like foo() if ref($thing) && "$thing" =~ m{.+?=HASH(0x.+?)}) checks inside a pet project of mine. So thought about making it in an elegant way instead. I mean ref($thing)->array or even ref($thing)->is_array seemed much better as a syntactic sugar. However ref() is clearly not the way to go for the implementation as it fails to identify the underlying type of objects. So, the obvious choice is Scalar::Util::reftype. I've named the module Scalar::Util::Reftype. In a way, I can say that it's similar to File::stat. Btw, Scalar::Util::reftype has one oddity: unlike CORE::ref it returns undef if you pass a non-ref parameter to it. So, instead of foo() if reftype($thing) eq 'ARRAY' one must say foo() if defined reftype($thing) && reftype($thing) eq 'ARRAY' or just foo() if reftype($thing) && reftype($thing) eq 'ARRAY' to prevent an annoying warning:


C:\>perl -MScalar::Util=reftype -wle "my $x; print reftype($x) eq 'ARRAY'"
Use of uninitialized value in string eq at -e line 1.

It also can not detect Regexp (CORE::ref can, as long as it's not blessed):


C:\>perl -MScalar::Util=reftype -wle "my $x = qr//; print reftype $x"
SCALAR

C:\>perl -wle "my $x = qr//; print ref $x"
Regexp

C:\>

Fortunately, perl 5.10 comes with re::is_regexp to detect if an object is based on a regex or not. But what about older perls? We can remedy the situation with the help of Data::Dump::Streamer::regex under at least perl 5.8.x. Unfortunately Data::Dump::Streamer seems to fail under anything older than that. I wasn't aware that re::is_regexp is a new functionality until reached "ref() and Regexp" discussion on PerlMonks.

I also checked ref documentation. As of perl 5.10 it lists these reference types:


SCALAR
ARRAY
HASH
CODE
REF
GLOB
LVALUE
FORMAT
IO
VSTRING
Regexp

Only perl 5.10's ref seems to detect VSTRING refs and since they are deprecated and the usage seems to be rare, the module does not support them. Also FORMAT is only available in perl 5.8 and newer. But frankly, I can't imagine anyone creating refs/objects based on LVALUE, FORMAT or VSTRING (hmmm... maybe only TheDamian). So, they exist only for the sake of compatibility. For the Regexp type, I've just added a dynamic dependency on Data::Dump::Streamer if Scalar::Util::Reftype is tried to be installed under anything older than perl 5.10.

The interface is simple. Just use the module to get a brand new reftype function:


        use Scalar::Util::Reftype;

        foo() if reftype( "string" )->hash; # foo() will never be called
        bar() if reftype( \$var )->scalar; # bar() will be called
        baz() if reftype( [] )->array; # baz() will be called
        xyz() if reftype( sub {} )->array; # xyz() will never be called

        $obj = bless {}, "Foo";
        my $rt = reftype( $obj );
        $rt->hash; # false
        $rt->hash_object; # true
        $rt->class; # "Foo"

reftype will create an object based on the parameter you specified and it is possible to call test methods on the return value. It currently has these test methods:


scalar
array
hash
code
glob
lvalue
format
ref
io
regexp
scalar_object
array_object
hash_object
code_object
glob_object
lvalue_object
format_object
ref_object
io_object
regexp_object
class

Here, class can be thought as analogous to Scalar::Util' s blessed function. It returns the package/class name of the reference if it happens to be a blessed reference. The rest of the methods test if the parameter matches the type they define.

Oh, I've also overloaded the object Scalar::Util::Reftype:reftype returns, to be sure that it will not be used in boolean contexts. If one makes such a thing, then the code will suffer the consequences :)

Thursday September 03, 2009
11:25 PM

Parse::HTTP::UserAgent: yet another user agent string parser

I was using HTTP::BrowserDetect for a long time. Not because it's a pice of art or accurate, but because of laziness perhaps. When I had some free time, I thought about re-inventing the wheel, like I did several times before. The main reason for re-inventing is the source code and interface of the module (try to read it, you'll understand) and the lack of new releases. Also, it's not accurate.

There are two other alternatives though: HTML::ParseBrowser and HTTP::DetectUserAgent. The former is really good parser-wise, while the latter is actually a sniffer and does not give you a verbose result.

So, I wrote Parse::HTTP::UserAgent. It tries to be verbose and parse as much as possible from the junk named "User Agent String". It tries to identify the major browsers first and then falls back to minor/old ones with an extended probe. The parsed structure has many fields like:

name               Browser name. You may need to check original_name() if faker (like Maxthon).
version_raw        Browser version
version            version(version_raw)->numify: The float version of the parsed version.
original_name      The original name (i.e.: Maxthon)
original_version   The original version (i.e.: 2.0 (Maxthon))
os                 Operating system. Windows names returned instead of versions
lang               The "user interface" language of the browser
toolkit            [tk_name, tk_version, version(tk_version)->numify]. Gecko, Trident, etc.
dotnet             If it has .NET CLR version in the string, this'll have all versions
mozilla            If a Mozilla browser, returns Moz version: [original, version(original)->numify]
strength           Encryption strength (I guess this does not have much value today)
robot              UA is a robot
extras             Any non-parsable junk. Arrayref.
parser             The name of the parser that returned the result set
generic            Parsed by a generic parser? Bool.
string             The original User Agent String
unknown            User Agent String can not be parsed
device             ***not implemented yet
wap                ***not implemented yet
mobile             ***not implemented yet

The module also has ->as_hash and ->dumper methods for debugging purposes.

The biggest difference is; it parses the fakers like Maxthon accurately. Also extracts .NET versions and toolkit names and versions. It also identifies Opera 10 (btw, Opera is the first thing I install on a new system) correctly.

The version numbers are converted to decimals to ease comparison (I dislike that major/minor stuff the others implement). The conversion also removes any junk string (like "gold") from the version number. While using version is good, as it handles all the nasty stuff, I got some regression from 5.6.2 smokers after releasing the module. It looks like they (5.6.2) have the pure perl version::vpp (I couldn't compile the xs version under 5.6.1 either) which has some kind of bug. I've opened a ticket about the issue, but also added a workaround to fool version::vpp (postfix '.0' if version is three digits). I currently have no idea about 5.5.x but 5.6.x seem to be fine at least (also tested myself with ActivePerl 5.6.1 on a virtual Windows XP).

The module also has some example programs in it for benchmarking. I'll give some figures below. The test system is: Windows Vista Home Premium SP2 32bit & P8600 @ 2.40GHz & ActivePerl 5.10.0.1004

C:\>perl -Ilib eg\bench.pl -c 1000
*** The data integrity is not checked in this run.
*** This is a benchmark for parser speeds.
*** Testing 161 User Agent strings on each module with 1000 iterations each.
 
This may take a while. Please stand by ...
 
          Rate    HTML   HTML2 Browser   Parse  Parse2  Detect
HTML    12.6/s      --     -2%    -63%    -75%    -82%    -90%
HTML2   12.9/s      2%      --    -62%    -75%    -81%    -90%
Browser 34.2/s    170%    166%      --    -33%    -51%    -73%
Parse   51.1/s    304%    297%     50%      --    -26%    -59%
Parse2  69.4/s    449%    439%    103%     36%      --    -44%
Detect   125/s    888%    871%    266%    144%     80%      --
 
The code took: 241.65 wallclock secs (228.21 usr +  0.08 sys = 228.29 CPU)
 
---------------------------------------------------------
 
List of abbreviations:
 
HTML      HTML::ParseBrowser v1
HTML2     HTML::ParseBrowser v1 (re-use the object)
Browser   HTTP::BrowserDetect v0.99
Detect    HTTP::DetectUserAgent v0.01
Parse     Parse::HTTP::UserAgent v0.16
Parse2    Parse::HTTP::UserAgent v0.16 (without extended probe)

HTML::ParseBrowser is slow as hell. Even re-using the object as the doc suggests does not help. It's good that I wasn't aware of the module until now :p HTTP::BrowserDetect is not a good performer too. But the interface is extensive and it's kinda defacto standard in this area. It tries to match with *anything* possible and this choice slows it down (who cares if $ua->win31 is true as of today right?). HTTP::DetectUserAgent is the speedy one here. It doubles Parse::HTTP::UserAgent even when the extended probe is disabled. However it gains this speed with several CAVEATs as the version number suggests.

C:\>perl -Ilib eg\accuracy.pl
*** This is a test to compare the accuracy of the parsers.
*** The data set is from the test suite. There are 161 UA strings
*** Parse::HTTP::UserAgent will detect all of them
*** A tiny fraction of the regressions can be related to wrong parsing.
*** Equation tests are not performed. Tests are boolean.
 
This may take a while. Please stand by ...
 
------------------------------------------------------------------------- ---------------------
| Parser                 | Name FAILS     | Version FAILS  | Language FAILS | OS FAILS       |
---------------------------------------------------------------------------- ------------------
| HTTP::DetectUserAgent  |   27 -  16.77% |   37 -  23.27% |   67 - 100.00% |   35 -  24.31% |
| HTTP::BrowserDetect    |   28 -  17.39% |    8 -   5.03% |   67 - 100.00% |   20 -  13.89% |
| HTML::ParseBrowser     |    0 -   0.00% |    3 -   1.89% |   42 -  62.69% |   19 -  13.19% |
| Parse::HTTP::UserAgent |    0 -   0.00% |    3 -   1.89% |    3 -   4.48% |    4 -   2.78% |
----------------------------------------------------------------------------- -----------------

Parse::HTTP::UserAgent is not perfect, but at least it seems to be close. HTML::ParseBrowser is more accurate on name/version matching. Speedy HTTP::DetectUserAgent seems to be the worst. However there is one caveat, the test data is from the Parse::HTTP::UserAgent test suite. So, Parse::HTTP::UserAgent is not actually that good yet since there are some patterns it can not match.

Note: The module is already on CPAN, but you can get the latest code and non-CPAN content from the code repository. The repo also has a etc/Migration.pod for HTTP::BrowserDetect users.

Thursday October 16, 2008
02:45 PM

Having a single version number in all modules in a distro

I was trying to figure out a mechanism to somehow format all modules in a distro (Text::Template::Simple) automatically to have a single version number instead of varying versions among files. I'm not so sure if this is the best way, but I chose to manually modify the files to update the versions in them. First, I had to subclass Module::Build to alter the `Build dist` action. However, M::B has an awkward interface for subclassing. One needs to pass the sublass code as a string into the subclass() method. Weirdo :p But since I didn't like this interface for subclassing and I wanted to use the syntax checking/coloring of my Komodo Edit, I've decided to load the content from an external file:

my $class = Module::Build->subclass(
                                class => 'MBSubclass',
                                code => raw_subclass(),
                        );

sub raw_subclass {
        my $file = File::Spec->catfile( 'tools', 'Build.pm' );
        my $FH = IO::File->new;
        $FH->open( $file, 'r' ) or die "Can not open($file): $!";
        my $rv = do { local $/; <$FH> };
        close $FH;
        return $rv;
}

And here is the subclass (note that there is no package declaration since M::B adds this part automatically afterwards):

use strict;
use vars qw( $VERSION );
use warnings;
use File::Find;
use constant RE_VERSION_LINE => qr{
      \A \$VERSION \s+ = \s+ ["'] (.+?) ['"] ; (.+?) \z
}xms;
use constant VTEMP => q{$VERSION = '%s';};

$VERSION = '0.10';

sub ACTION_dist {
      my $self = shift;
      warn sprintf(
                        "RUNNING 'dist' Action from subclass %s v%s\n",
                        ref($self),
                        $VERSION
                  );
      my @modules;
      find {
            wanted => sub {
                  my $file = $_;
                  return if $file !~ m{ \. pm \z }xms;
                  push @modules, $file;
                  warn "FOUND Module: $file\n";
            },
            no_chdir => 1,
      }, "lib";
      $self->_change_versions( \@modules );
      $self->SUPER::ACTION_dist( @_ );
}

sub _change_versions {
      my $self = shift;
      my $files = shift;
      my $dver = $self->dist_version;

      warn "DISTRO Version: $dver\n";

      foreach my $mod ( @{ $files } ) {
            warn "PROCESSING $mod\n";
            my $new = $mod . '.new';
            open my $RO_FH, '<:raw', $mod or die "Can not open file($mod): $!";
            open my $W_FH , '>:raw', $new or die "Can not open file($new): $!";
            my $changed;
            while ( my $line = readline $RO_FH ) {
                  if ( ! $changed && ( $line =~ RE_VERSION_LINE ) ) {
                          my $oldv = $1;
                          my $remainder = $2;
                          warn "CHANGED Version from $oldv to $dver\n";
                          printf $W_FH VTEMP . $remainder, $dver;
                          $changed++;
                          next;
                  }
                  print $W_FH $line;
            }

            close $RO_FH or die "Can not close file($mod): $!";
            close $W_FH or die "Can not close file($new): $!";

            unlink($mod) || die "Can not remove original module($mod): $!";
            rename( $new, $mod ) || die "Can not rename( $new, $mod ): $!";
            warn "RENAME Successful!\n";
      }

      return;
}

It's really straightforward. Find the *.pm and them create a modified copy that has the distro's version and replace the original with the new one and resume `dist` process :)
Friday October 03, 2008
03:00 PM

Text::Template::Simple 0.61 is released

You can get if from CPAN :)

Actually, I've released 0.60 after several development versions. But immediately faced the infamous World-writable Files thingy. While I still don't think this is some serious security breach (compared to allowing arbitrary Makefile.PLs and Build.PLs entering your system), PAUSE indexer warned me (thanks to Andreas Koenig's recent change) about world-writable "directories" inside my tarball. Sice I was not using some 3rd party tar command and using Module::Build as the toolkit, I thought that this thing will not affect my distro. But I was wrong.

I didn't dig this much and both Archive::Tar (which handles archiving) and Module::Build lacked any info regarding this. So, after some quick investigation, as a quick fix, I've modified Module::Build::Base and changed this line in line 3704:

   Archive::Tar->create_archive("$file.tar.gz", 1, @$files);

into this (removed adding directories to tar)

   Archive::Tar->create_archive("$file.tar.gz", 1, grep { !-d $_ } @$files);

which seemed to solve my problem. I even opened a bug in the Module::Build RT Queue. I hope they'll apply this or find a better way to fix the tarball issue. And as I said in the RT BUG: I'm surprised that no one in the email thread seem to use this trio as their environment: Windows + Module::Build + Archive::Tar :p

Anyway, lets return to the subject. I've released a new version of Text::Template::Simple and it is kind of a milestone release including these new stuff:

  • Dynamic Includes (a.k.a processed includes)
  • Interpolation in includes
  • Chomping (global & per directive)
  • Template name access through $0
  • Explicit types to compile()

Chomping is similar to what TT has and maybe more. The biggest and tricky part was the dynamic includes and interpolation in includes. I've implemented that stuff several times before reaching it's current status (actually same thing happened with chomping). Includes currently miss stuff like parameter passing and applying filters, but I'll add these features eventually. At the moment it is possible to use things like:

<% my $file = "t/data/interpolate_data"; %>
<%* $file . ".tts" %> # dynamic
<%+ $file . ".tts" %> # static

or without interpolation:

<%* t/data/interpolate_data.tts %> # dynamic
<%+ t/data/interpolate_data.tts %> # static

And chomping:

Test
   <%=- $foo -%>
123

Template name access:

   I am <%= $0 %>

See the documentation for more information.

I like TT's features and even have to use it @ $work, but I need a non-mini-language thing. And CPAN is filled with re-invented wheels right? :)

Sunday July 13, 2008
09:32 AM

Hello World

I don't know how long I've been a user in use Perl, but this is my first journal entry. Yay! :)

I've released Sys::Info 0.60 today. It's a milestone version that has mostly compatible Linux & Windows & Unknown (Generic) drivers. Windows OS driver now supports Windows Server 2008 and a lot of other Windows editions too. Windows Server 2008 support was a little bit tricky since Microsoft did not bump the version number with this os release (unlike Server 2003) and it has the same version number as Windows Vista. It is only detectable through the editions.

For the Windows driver, my original plan was to drop WMI interface and use Win32::API, however I've dropped Win32::API idea in favor of XS and WMI turned out to be some huge beast that can not be duplicated easily (or not at all). I was lost on MSDN on this subject :p

Linux driver also has improved OS support too. I've implemeted several OS meta keys and also tried to mimic the "Edition" information for distros. Currently, only Ubuntu is supported by the edition() method.

There is also a new cdkey() function to return the cdkeys for the OS and Office software. You can guess that this only has a meaning and only implemeted in the Windows driver :) It's basically a shortcut to learn the cdkey quickly in case you need it. Original code for that was taken from a PerlMonks thread.

There are some improvements on the CPU detection side too. Hyper Threading detection is improved and it now returns the number of threads if Hyper Threading is in effect. Also, the data structure now has the "architecture" key. However the load() method has a caveat it returns the current CPU usage instead of the load average (which does not exist in Windows anyway).

Sys::Info was initiated back in 2004 while I was supposed to write down my undergraduate thesis some time late at night (3 am Eternal :p) and evolved from that. The idea was to display the OS & CPU name (and HTTP Server name) correctly in a CMS system I was (and still -- I'm lazy --) working on, but this thing resulted with a module suite. The first CPAN release was two years later from that...

I'm trying to create a single and mostly equal interface between the OS specific drivers for system information but it turned out that this is not as easy as I thought back then.

Future plans:

  • Improving current drivers (Windows/Linux/Unknown).
    • - Maybe split the drivers into their own distros.
  • Creating a separate distro for Sys::Info::Device::BIOS
  • Adding support for some other devices
  • Create a mechanism for drop-in driver/device system.
  • (In a far far future) Create a *BSD driver (which I didn't even use once 'till now)
  • (In a far far future) If (somehow) I can have access to a Mac(book(\s?Pro|)?); create a MacOSX (darwin?) driver

Patches & Suggestions are welcome as always :)

perl -MSys::Info -wle "sub n{Sys::Info->os->name(@_)} print for n(),n(long=>1),n(edition=>1),n(edition=>1,long=>1)"

Example outputs:

(1)

Windows Server 2008
Windows Server 2008 Service Pack 1 build 6001
Windows Server 2008 Enterprise Edition Full Installation
Windows Server 2008 Enterprise Edition Full Installation Service Pack 1 build 6001

(2)

Windows Vista
Windows Vista Service Pack 1 build 6001
Windows Vista Enterprise Edition
Windows Vista Enterprise Edition Service Pack 1 build 6001

(3)

Windows XP
Windows XP Service Pack 3 build 2600
Windows XP Professional
Windows XP Professional Service Pack 3 build 2600

And here is an Ubuntu output ;)

perl -MSys::Info -wle 'sub n{Sys::Info->os->name(@_)} print for n(),n(long=>1),n(edition=>1),n(edition=>1,long=>1)'

Ubuntu Linux
Ubuntu Linux 8.04 (kernel: 2.6.24-19-generic)
Ubuntu Linux (Hardy Heron)
Ubuntu Linux (Hardy Heron) 8.04 (kernel: 2.6.24-19-generic)