Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

bart (450)

Journal of bart (450)

Sunday December 21, 2008
07:38 PM

Fixing world writable files in tarball before upload to CPAN

Fairly recently, CPAN changed its policy regarding uploaded distributions: if the distribution contains world writable files and/or directories (I'm not entirely clear about its exact rules), then CPAN won't index it.

That is a problem that bites authors who create their distributions on Windows: as Windows doesn't know Unix file permissions, a typical tar on Windows will simply set all file modes to 0777. Well, duh!

Some people have reconsidered fixes, such as Burak who claims that if you exclude directories from explicitly mentioning them, when creating the tar file, that then the problem will not occur.

My idea instead would be to fix the stupid behaviour in tar.

A second best approach, for now, until it gets a definite solution, is to clean up the tarball you just created, going over every file and directory in it, and fix its file mode.

And that's what I did here. I've used Archive::Tar, which turned out to be slightly more problematic than I thought, but I seem to have gotten it to behave. One nasty problem is backward compatibility of the tar files: by default Archive::Tar strips the path away from the file name, and stuffs it in a nonstandard "prefix" field. I've seen tar archive tools fall over this. Setting $Archive::Tar::DO_NOT_USE_PREFIX to 1 stops this behaviour, and you get backward compatible tar files, as long as the full name of the entry (including relative path) is at most 100 Ascii characters long. I do not expect this to be a problem in a typical CPAN upload.

Archive::Tar keeps the entire archive in memory, which may pose a problem for huge tar files, but most likely not for any archive to be uploaded to CPAN.

Friday December 05, 2008
02:49 AM

The Perl Advent Calendar 2008 is up!

Did anybody yet mention that the Perl Advent Calendar 2008 is live? Take a look: one article a introducing a module that is not as well known as it deserves, per day, until Christmas.

Thanks to the hard work of belg4mit and several volunteers, with support from the Boston.pm group.

Tuesday August 19, 2008
11:47 AM

YAPC::EU first impressions

I just came home from my first YAPC, in Copenhagen, a few days ago.

Overall impressions are good: I've had a very busy schedule, I've been to a talk for every single time slot, and there is not a talk I've been to that I considered a waste of time. So that is good.

To be honest: I didn't really expect catering, so that was a pleasant surprise, not in the least because Denmark is so expensive. What was even a better surprise is that the quality of the food was good. It was a lot better than anything I've ever been offered to eat on an airplane, for example.

I'm not going to discuss "the incident", not even while I was in the middle of it, because I had already largely forgotten about it. It's not that important.

But there still is something that irks me. I feel that some people who have been to YAPC more than once, use it to put themselves in the spotlight. I'm not going to mention names, I'm not even going to say about how many people I am talking, because I do not want this to degenerate into a mindless, meritless flamewar, plus, I am not the one who is going to draw the line of what is or is not acceptable. You have to think about that for yourself.

In short, I do think that not always the same people should be in the center of attention. Especially people of whom I think they don't really deserve it. I think it's time to put a stop to the ego-tripping.

So, for next time... Please don't always let the same people present the show. And don't let all the talks be given by all the same people every year, if they have nothing new to say. It is time for fresh blood.

Wednesday June 25, 2008
06:15 AM

Frustrations about Oracle

I've been a professional user of the Oracle database (now at 10g2) for almost 2 years now. It appears to me to be a very solid and fast database. Its query analysis tools to profile slow SQL queries are excellent. PL/SQL is a nice, "modern" programming language with good features, high speed, and a nice integration of SQL in between procedural programming statements.

Its price does not really bother me... because I don't have to pay for it. I do think that if a company can afford to pay a several professional programmers to work with Oracle full-time, they ought to be able to afford Oracle itself too...

And yet, there are a few things that I find rather frustrating...

Oracle comes with PL/SQL libraries for virtually everything that you can think of: fetching web pages over HTTP, sending mail over SMTP, processing XML with XSLT... But these libraries are not exactly bug free.

Take the XSLT processor as an example. I don't know what it is based on... (probably some open source project, as appears to often be the case with Oracle... :)). It claims to be XSLT 2.0 compatible, yet several basic functions (like lower-case ) are simply not working. disable-output-escaping does not work, and it always indents the output html tree, even when you try to turn it off.

Oracle provides a special way to store XML files, in what it calls "XMLDB". On the surface it's quite impressive... (which is its main purpose, that I can see... :)) You can access it through WebDAV, i.e. using a http URL in a Windows Explorer window, which supports drag and drop to manage the files in it.

And in XSLT, you can access the files in XMLDB using that http URL. But in these files, for example in a document() call, relative URLs to other files simply do not work. You have to use an absolute URL to link to files next to it. That is very impractical, as it requires that you modify the content of the files if you move the repository, or simply, if you add a set of files after you tested them with an XSLT processor on the command line...

What is quite unpleasant, is dealing with VARCHAR2 (strings up to 32k) vs. CLOB (any size) (and their counterparts for binary strings: RAW and BLOB). Those LOBs are a pain to work with: you have to use file-access-like library function calls to work with them, while for VARCHAR2, you can use simple straightforward string manipulation operators just like in other languages. LOBs are just too low level.

And what's really frustrating is that Oracle's PL/SQL libraries virtually all only work with VARCHAR2 and RAW. For example, if you are building a MIME mail message with attachments, there is a library to base64-encode your binary strings at your disposal.... but its parameters and return values are limited to 32k, so you can't use it for larger files than about 24k!

For any real world tasks that goes beyond a simple demo, every single developer has to begin by spending hours writing routines for simple housekeeping tasks, routines that IMO just ought to be part of a standard library. Sure, you can find example code for about any task on the internet, but it's often of dubious quality.

For example, look at the function replaceClob() on http://www.psoug.org/reference/dbms_lob.html (which is, AFAIK, quite a respectable site). It implements an emulation of the core VARCHAR2 function replace(), but for a CLOB. It works by doing the replacements in one chunk at a time, and then simply concatenates the results. Thus, it will skip matching substrings that overlap the edges between chunks. That is quite a serious bug. This is a typical simple example.

So not only do developers have to waste hours of time on housekeeping tasks, the results of their efforts are commonly still of a rather poor quality I doubt that the code I wrote for it is so much better. That is a doubly unpleasant price, for whoever has to pay for it.

Friday June 13, 2008
07:02 PM

Images in Spreadsheet files

Earlier this week, somebody who shall remain anonymous, was wondering out loud on the Perlmonks Chatterbox why he can't put an image in a CSV file.

I was simply baffled.

How can anybody claim to be a (junior) programmer, and so totally lack any insight in why this is not possible, or not even imaginable?

I asked him if he even knew what a CSV file was. Yes, it's a text file containing the text to put in fields in a table. Just the text.

It now makes me wonder if he knew what an image was, then.

It's things like that that really make me wonder if there can be any hope at all that anybody like that, may ever pick up enough technical insight and skills, to have any future as a programmer, at all.

Thursday June 05, 2008
03:54 PM

Last-Modified and If-Modified-Since

Recently I have been experimenting with the behavior of browsers (all on Windows XP) to the presence a Last-Modified header, in the HTTP reply on a web server, in the context of generating semi-static content. I found the response of Firefox 2 most intriguing.

It appears that Firefox doesn't even look at the contents of this header, it just stores it for later. You can put a nonsense string in the Last-Modified header (from the server to the browser), and the next time a browser tries to fetch the file, it'll send the exact same string back in a If-Modified-Since header (from the browser to the server). I used "The bananas in my cellar are still quite green" as a test value, which, I hope you agree, looks nothing like a date. And that is exactly what I got back.

As a result, it acts like a private cookie, but just for this one URL, not for the whole domain, and not even for the siblings of this URL on the same path.

I found Opera 9 apparently behaves the same.

Now, MSIE(7) and Safari are something else. MSIE does appear to look at the contents of this header, it simply drops it if it can't make a date out of it. The format it accepts is quite flexible, I sent it something that's close to ISO-formatted: 'YYYY-MM-DD HH24:MI:SS "GMT"', to put it in Oracle's date formatting terms (for example: '2008-06-05 12:34:56 GMT'). But what it sent back was not the same string, but a date string that is converted back to the http standard form: 'Dy, DD-Mon-YYYY HH24:MI:SS "GMT"' (for example: 'Thu, 05-Jun-2008 12:34:56 GMT')(which, BTW, doesn't make sense to me as a standard, looking at it from a date parsing point of view: it's too complex).

Safari takes this even one step further: if the header isn't a date in http standard form, then the header is simply dropped. It simply doesn't send an If- Modified-Since header on the next request.

But it is safe to say that if your date in the Last-Modified header is in http standard form, you will get the exact same string back in the If- Modified-Since header. No browser appears to change the value of the date. It doesn't matter if your clock is off, or you're in the wrong time zone... All that matters is that you'll get the exact same date string back as you sent out.

So, as a rule of thumb, how do I recommend using it? I am now feeling that you ought not try to convert the date back to seconds-since-epoch, or whatever internal format you may be using, and next compare it to the file modification date. Instead, you ought to convert the file modification date to a standard http date string, and compare this string to the If-Modified-Since header. If it's the exact same string, then you may safely send out a "304 Not Modified" header and not much else (no body). If it's a different string, then send the whole file, headers and body, again.

It doesn't matter if your clock is off. All that matters, is that you must be consistent, and always format the same date/time into the same string. And then, it'll just work.

Note that using this scheme, it'll also send out the body if the Last-Modified header is technically later than the date in the If-Modified-Since header. That' s not bad, instead, it's better: all too often I find that someone replaced a file on a webserver with a copy of an older file, and if you do check which date is the latest, then you will miss this change.

Saturday May 10, 2008
01:39 AM

What I like/dislike about Perl

Inspired by jarich's post (and thus indirectly by brian_d_foy's request) and by a question on arrays on Perlmonks, I've made a list of things I particularly like or dislike about Perl. Sometimes, items in both lists can be caused by the same things, so these things are the result of a compromise, and I don't think they can easily be improved.

Note that these are things from the top of my head, so it's likely that I've forgotten about some stuff that I normally have a huge axe to grind with. So, here goes:

Likes about Perl:

  • Regexps! Hashes!
    Obviously, to me these were Perl's immediate selling points, 12 years ago.
    But, obviously, since then, many other languages have copied both features.
  • Simplicity/transparency of passing arguments to subs, using @_
    Data in @_ is "passed by reference", it's assigning to local variables that makes it "pass by value"
  • Flattening of lists: this makes join(', ', @foo, @bar) possible, and implementing a min function min that can be used both for min(@foo) and for min($x, $y, z)
  • scalar/list context
  • sort function, compared to the mess in PHP (array_multisort)
    Ease to implement Schwartzian Transform, or-cache etc.
  • regular expressions as first class objects, as opposed to languages where a regexp is a string, until you use it as a regexp (i.e. most of them)
  • interpolation (doublequotish strings)
  • garbage collection, using reference counting
  • general syntax: no obsession with uniformity of syntax, the "adapted to humans" ad hoc syntax of Perl generally works very well
  • transparency of implementation, introspection. Extending/modifying Perl from within Perl, is generally quite easy... For example: Hook::LexWrap, Carp, Fatal, Memoize
  • closures
  • map, grep
  • overload
  • The fact that "use" of a module happens on the Perl level, for example import, but also BEGIN (INIT, CHECK)
  • AUTOLOAD for possibly loading modules or generating subs on the fly

Dislikes:

  • nested subs and lexicals in the outer sub just don't work well together (DWIM? Are you kidding me?)

    What I really want, is the way it works in Javascript: inner subs are not visible outside the outer sub, and lexical variables from the outer sub are visible/accessible/shared in the inner subs

    The reason for how it works in Perl, is likely because a BEGIN block is a sub (you may even use the keyword "sub", though nobody does that), that's executed immediately and subs used in a BEGIN block must be considered as global subs.

    As a result, this makes implementing a framework similar to mod_perl, where a plug-in looks like a Perl script but is loaded from file once and next can be called many times, unnecessarily hard.

    Hate, hate, hate!

  • lack of formal parameters
  • complex data structures can be hard, confusion between array and array reference: ['a', 'b', 'c'] is called an "anonymous array" but actually it's an array reference The fact that Perl distinguishes between arrays and array references can be a good thing, but it has its disadvantages.
  • lack of proper native support for aliasing, for scalars you can fake it with for my $x ($y) { ... } and inside the block, $x is an alias to $y. There's no equivalent trick for aggregates
  • Passing arrays/hashes by a reference (of course) results in the need in the sub to access the hash in the sub through the reference. The above points make that not easy to remedy.
  • lack of clean way to interpolate functions/method calls in doublequotish strings, @{[...]} is a hack
  • constants, which are argument-less subs, and thus, you can't interpolate them in strings
  • need for $, @, % for variables in not-interpolating context (syntax pollution)
  • local doesn't work with lexical scalars -- it does work on individual items in arrays/hashes, even lexical ones.
  • lack of ordering in hashes, the implicit "same order as insertion order" in PHP/Javascript, and as implemented in the modules Tie::IxHash/Tie::Hash::Indexed, works very well for me.
  • for OO, lack of proper instance variables. Access to attributes is low level and look ugly like $obj->{x}. Direct access to with $x instead of $obj->{x} would be most welcome, even if only as syntactic sugar.
  • hard to parse syntax with tools If you can't modify Perl from within, it's virtually impossible to change it on the source level. Source filters are generally considered a poor idea, because the chance of getting it wrong, is huge.
  • lack of embedding of custom "small languages", like SQL. For example, for DBI, I'd really prefer it if the syntax of embedded SQL could be checked at compile time (maybe assuming a broad SQL syntax, even if this particular database doesn't like it). This is not practically feasible because of the previous point (hard to parse syntax / source filters bad).
  • Exceptions are a hack. "eval BLOCK" is a terrible name, only used because "eval STRING" also catches errors... so it's a historical choice, not functional It should have something like try/catch, or even the on "error goto ERRLABEL" from VB.
  • $SIG{__DIE__} is called in eval
  • no functional/chaining versions of s/// and tr/// (as in Javascript)

Hmm, and that off the top of my head... It has become quite a long list, actually.

update I knew I had to forget someting. So, without delay, the addendum:

Pro:

  • General execution speed (apart from a slight delay at startup)

Contra:

  • Memory footprint, even for tiny scripts: at least several megabytes. It's enough to have me often convert often run short scripts into another language with a much smaller footprint.
Thursday May 08, 2008
02:40 PM

LOLcat

I don't follow what's going on in the LOLcat world closely, but when I recently saw this entry, it really made me laugh. It's both so disrespectful and so... ordinary, at the same time.

Enjoy.

All he ever wanted...

Thursday March 06, 2008
11:56 AM

More CPAN.pm silliness

The author used the regex snippet /(?!\n)\Z/ 17 times in the main source, instead of the simpler and equivalent /\z/.

From perlre:

\Z
Match only at end of string, or before newline at the end
\z
Match only at end of string

Duh? In a core module? Doesn't anybody but the maintainers ever check what goes into a core module?

Tuesday March 04, 2008
03:13 PM

CPAN.pm weirdness

The latest official release of CPAN.pm, version 1.9205, contains a null byte in its source file.

I am somewhat surprised that perl doesn't trip over it, as there have been far more innocuous things that have made it stumble in the past, like line endings of another platform (Mac/Unix).

In case you're wondering: it's in sub CPAN::Shell::recent, at the start of the line with contents (that appear in the source only once):

$desc =~ s/.+? - //;

p.s. It still is there in the latest developer release (1.92_57).