Stories
Slash Boxes
Comments

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

bart (450)

Journal of bart (450)

Wednesday June 25, 2008
07:15 AM

Frustrations about Oracle

I've been a professional user of the Oracle database (now at 10g2) for almost 2 years now. It appears to me to be a very solid and fast database. Its query analysis tools to profile slow SQL queries are excellent. PL/SQL is a nice, "modern" programming language with good features, high speed, and a nice integration of SQL in between procedural programming statements.

Its price does not really bother me... because I don't have to pay for it. I do think that if a company can afford to pay a several professional programmers to work with Oracle full-time, they ought to be able to afford Oracle itself too...

And yet, there are a few things that I find rather frustrating...

Oracle comes with PL/SQL libraries for virtually everything that you can think of: fetching web pages over HTTP, sending mail over SMTP, processing XML with XSLT... But these libraries are not exactly bug free.

Take the XSLT processor as an example. I don't know what it is based on... (probably some open source project, as appears to often be the case with Oracle... :)). It claims to be XSLT 2.0 compatible, yet several basic functions (like lower-case ) are simply not working. disable-output-escaping does not work, and it always indents the output html tree, even when you try to turn it off.

Oracle provides a special way to store XML files, in what it calls "XMLDB". On the surface it's quite impressive... (which is its main purpose, that I can see... :)) You can access it through WebDAV, i.e. using a http URL in a Windows Explorer window, which supports drag and drop to manage the files in it.

And in XSLT, you can access the files in XMLDB using that http URL. But in these files, for example in a document() call, relative URLs to other files simply do not work. You have to use an absolute URL to link to files next to it. That is very impractical, as it requires that you modify the content of the files if you move the repository, or simply, if you add a set of files after you tested them with an XSLT processor on the command line...

What is quite unpleasant, is dealing with VARCHAR2 (strings up to 32k) vs. CLOB (any size) (and their counterparts for binary strings: RAW and BLOB). Those LOBs are a pain to work with: you have to use file-access-like library function calls to work with them, while for VARCHAR2, you can use simple straightforward string manipulation operators just like in other languages. LOBs are just too low level.

And what's really frustrating is that Oracle's PL/SQL libraries virtually all only work with VARCHAR2 and RAW. For example, if you are building a MIME mail message with attachments, there is a library to base64-encode your binary strings at your disposal.... but its parameters and return values are limited to 32k, so you can't use it for larger files than about 24k!

For any real world tasks that goes beyond a simple demo, every single developer has to begin by spending hours writing routines for simple housekeeping tasks, routines that IMO just ought to be part of a standard library. Sure, you can find example code for about any task on the internet, but it's often of dubious quality.

For example, look at the function replaceClob() on http://www.psoug.org/reference/dbms_lob.html (which is, AFAIK, quite a respectable site). It implements an emulation of the core VARCHAR2 function replace(), but for a CLOB. It works by doing the replacements in one chunk at a time, and then simply concatenates the results. Thus, it will skip matching substrings that overlap the edges between chunks. That is quite a serious bug. This is a typical simple example.

So not only do developers have to waste hours of time on housekeeping tasks, the results of their efforts are commonly still of a rather poor quality — I doubt that the code I wrote for it is so much better. That is a doubly unpleasant price, for whoever has to pay for it.

Friday June 13, 2008
08:02 PM

Images in Spreadsheet files

Earlier this week, somebody who shall remain anonymous, was wondering out loud on the Perlmonks Chatterbox why he can't put an image in a CSV file.

I was simply baffled.

How can anybody claim to be a (junior) programmer, and so totally lack any insight in why this is not possible, or not even imaginable?

I asked him if he even knew what a CSV file was. Yes, it's a text file containing the text to put in fields in a table. Just the text.

It now makes me wonder if he knew what an image was, then.

It's things like that that really make me wonder if there can be any hope at all that anybody like that, may ever pick up enough technical insight and skills, to have any future as a programmer, at all.

Thursday June 05, 2008
04:54 PM

Last-Modified and If-Modified-Since

Recently I have been experimenting with the behavior of browsers (all on Windows XP) to the presence a Last-Modified header, in the HTTP reply on a web server, in the context of generating semi-static content. I found the response of Firefox 2 most intriguing.

It appears that Firefox doesn't even look at the contents of this header, it just stores it for later. You can put a nonsense string in the Last-Modified header (from the server to the browser), and the next time a browser tries to fetch the file, it'll send the exact same string back in a If-Modified-Since header (from the browser to the server). I used "The bananas in my cellar are still quite green" as a test value, which, I hope you agree, looks nothing like a date. And that is exactly what I got back.

As a result, it acts like a private cookie, but just for this one URL, not for the whole domain, and not even for the siblings of this URL on the same path.

I found Opera 9 apparently behaves the same.

Now, MSIE(7) and Safari are something else. MSIE does appear to look at the contents of this header, it simply drops it if it can't make a date out of it. The format it accepts is quite flexible, I sent it something that's close to ISO-formatted: 'YYYY-MM-DD HH24:MI:SS "GMT"', to put it in Oracle's date formatting terms (for example: '2008-06-05 12:34:56 GMT'). But what it sent back was not the same string, but a date string that is converted back to the http standard form: 'Dy, DD-Mon-YYYY HH24:MI:SS "GMT"' (for example: 'Thu, 05-Jun-2008 12:34:56 GMT')(which, BTW, doesn't make sense to me as a standard, looking at it from a date parsing point of view: it's too complex).

Safari takes this even one step further: if the header isn't a date in http standard form, then the header is simply dropped. It simply doesn't send an If- Modified-Since header on the next request.

But it is safe to say that if your date in the Last-Modified header is in http standard form, you will get the exact same string back in the If- Modified-Since header. No browser appears to change the value of the date. It doesn't matter if your clock is off, or you're in the wrong time zone... All that matters is that you'll get the exact same date string back as you sent out.

So, as a rule of thumb, how do I recommend using it? I am now feeling that you ought not try to convert the date back to seconds-since-epoch, or whatever internal format you may be using, and next compare it to the file modification date. Instead, you ought to convert the file modification date to a standard http date string, and compare this string to the If-Modified-Since header. If it's the exact same string, then you may safely send out a "304 Not Modified" header and not much else (no body). If it's a different string, then send the whole file, headers and body, again.

It doesn't matter if your clock is off. All that matters, is that you must be consistent, and always format the same date/time into the same string. And then, it'll just work.

Note that using this scheme, it'll also send out the body if the Last-Modified header is technically later than the date in the If-Modified-Since header. That' s not bad, instead, it's better: all too often I find that someone replaced a file on a webserver with a copy of an older file, and if you do check which date is the latest, then you will miss this change.

Saturday May 10, 2008
02:39 AM

What I like/dislike about Perl

Inspired by jarich's post (and thus indirectly by brian_d_foy's request) and by a question on arrays on Perlmonks, I've made a list of things I particularly like or dislike about Perl. Sometimes, items in both lists can be caused by the same things, so these things are the result of a compromise, and I don't think they can easily be improved.

Note that these are things from the top of my head, so it's likely that I've forgotten about some stuff that I normally have a huge axe to grind with. So, here goes:

Likes about Perl:

  • Regexps! Hashes!
    Obviously, to me these were Perl's immediate selling points, 12 years ago.
    But, obviously, since then, many other languages have copied both features.
  • Simplicity/transparency of passing arguments to subs, using @_
    Data in @_ is "passed by reference", it's assigning to local variables that makes it "pass by value"
  • Flattening of lists: this makes join(', ', @foo, @bar) possible, and implementing a min function min that can be used both for min(@foo) and for min($x, $y, z)
  • scalar/list context
  • sort function, compared to the mess in PHP (array_multisort)
    Ease to implement Schwartzian Transform, or-cache etc.
  • regular expressions as first class objects, as opposed to languages where a regexp is a string, until you use it as a regexp (i.e. most of them)
  • interpolation (doublequotish strings)
  • garbage collection, using reference counting
  • general syntax: no obsession with uniformity of syntax, the "adapted to humans" ad hoc syntax of Perl generally works very well
  • transparency of implementation, introspection. Extending/modifying Perl from within Perl, is generally quite easy... For example: Hook::LexWrap, Carp, Fatal, Memoize
  • closures
  • map, grep
  • overload
  • The fact that "use" of a module happens on the Perl level, for example import, but also BEGIN (INIT, CHECK)
  • AUTOLOAD for possibly loading modules or generating subs on the fly

Dislikes:

  • nested subs and lexicals in the outer sub just don't work well together (DWIM? Are you kidding me?)

    What I really want, is the way it works in Javascript: inner subs are not visible outside the outer sub, and lexical variables from the outer sub are visible/accessible/shared in the inner subs

    The reason for how it works in Perl, is likely because a BEGIN block is a sub (you may even use the keyword "sub", though nobody does that), that's executed immediately and subs used in a BEGIN block must be considered as global subs.

    As a result, this makes implementing a framework similar to mod_perl, where a plug-in looks like a Perl script but is loaded from file once and next can be called many times, unnecessarily hard.

    Hate, hate, hate!

  • lack of formal parameters
  • complex data structures can be hard, confusion between array and array reference: ['a', 'b', 'c'] is called an "anonymous array" but actually it's an array reference The fact that Perl distinguishes between arrays and array references can be a good thing, but it has its disadvantages.
  • lack of proper native support for aliasing, for scalars you can fake it with for my $x ($y) { ... } and inside the block, $x is an alias to $y. There's no equivalent trick for aggregates
  • Passing arrays/hashes by a reference (of course) results in the need in the sub to access the hash in the sub through the reference. The above points make that not easy to remedy.
  • lack of clean way to interpolate functions/method calls in doublequotish strings, @{[...]} is a hack
  • constants, which are argument-less subs, and thus, you can't interpolate them in strings
  • need for $, @, % for variables in not-interpolating context (syntax pollution)
  • local doesn't work with lexical scalars -- it does work on individual items in arrays/hashes, even lexical ones.
  • lack of ordering in hashes, the implicit "same order as insertion order" in PHP/Javascript, and as implemented in the modules Tie::IxHash/Tie::Hash::Indexed, works very well for me.
  • for OO, lack of proper instance variables. Access to attributes is low level and look ugly like $obj->{x}. Direct access to with $x instead of $obj->{x} would be most welcome, even if only as syntactic sugar.
  • hard to parse syntax with tools If you can't modify Perl from within, it's virtually impossible to change it on the source level. Source filters are generally considered a poor idea, because the chance of getting it wrong, is huge.
  • lack of embedding of custom "small languages", like SQL. For example, for DBI, I'd really prefer it if the syntax of embedded SQL could be checked at compile time (maybe assuming a broad SQL syntax, even if this particular database doesn't like it). This is not practically feasible because of the previous point (hard to parse syntax / source filters bad).
  • Exceptions are a hack. "eval BLOCK" is a terrible name, only used because "eval STRING" also catches errors... so it's a historical choice, not functional It should have something like try/catch, or even the on "error goto ERRLABEL" from VB.
  • $SIG{__DIE__} is called in eval
  • no functional/chaining versions of s/// and tr/// (as in Javascript)

Hmm, and that off the top of my head... It has become quite a long list, actually.

update I knew I had to forget someting. So, without delay, the addendum:

Pro:

  • General execution speed (apart from a slight delay at startup)

Contra:

  • Memory footprint, even for tiny scripts: at least several megabytes. It's enough to have me often convert often run short scripts into another language with a much smaller footprint.
Thursday May 08, 2008
03:40 PM

LOLcat

I don't follow what's going on in the LOLcat world closely, but when I recently saw this entry, it really made me laugh. It's both so disrespectful and so... ordinary, at the same time.

Enjoy.

All he ever wanted...

Thursday March 06, 2008
12:56 PM

More CPAN.pm silliness

The author used the regex snippet /(?!\n)\Z/ 17 times in the main source, instead of the simpler and equivalent /\z/.

From perlre:

\Z
Match only at end of string, or before newline at the end
\z
Match only at end of string

Duh? In a core module? Doesn't anybody but the maintainers ever check what goes into a core module?

Tuesday March 04, 2008
04:13 PM

CPAN.pm weirdness

The latest official release of CPAN.pm, version 1.9205, contains a null byte in its source file.

I am somewhat surprised that perl doesn't trip over it, as there have been far more innocuous things that have made it stumble in the past, like line endings of another platform (Mac/Unix).

In case you're wondering: it's in sub CPAN::Shell::recent, at the start of the line with contents (that appear in the source only once):

$desc =~ s/.+? - //;

p.s. It still is there in the latest developer release (1.92_57).

Thursday February 07, 2008
07:44 AM

The end of a meme?

Oh no! According to this news article, Duke Nukem Forver will be released at the end of this year. Will this be this the end of a meme? DNF is the prototypical example of eternal vaporware. Computer geeks love to make fun of it.

But, we're not there yet. It might still turn out right. Er, wrong. If not... we'll have to find something else to make fun of.

Tuesday January 29, 2008
08:01 PM

The pain of updating Perl

A few days I decided to upgrade ActivePerl on my laptop. Not the major upgrade to 5.10.0, not yet, I just wanted to have the new GUI version of PPM, just like I already had on my other computer. It's just a minor upgrade between builds of perl 5.8.8, from build 817 to build 822. That should be relatively painless... Not so.

Well, despite the fact that XS modules are binary compatible, the new build refuses to install on top of the older build. That means I'll have to uninstall perl, install the new version, and reinstall every module I had added. Ouch.

I remember having taken a Bundle snapshot with CPAN.pm over a year ago, and it wasn't pretty: installing that bundle resulted in CPAN.pm wanting to reinstall core modules. I didn't want to live through that again, besides, this being Windows, installing through CPAN would probably not be trivial for some modules. So this time, I was going to try to use PPM, and, preferably, automate it.

It's easy to get a list of modules installed with PPM, complete with version numbers into a file, with ppm query * or (is this new?) ppm list. (Oh, fun, apparently the output format has changed.).

But after that, I'm stuck. How the hell do you use that list to install those packages automatically? I'm stumped. I want to:

  • install modules I don't have yet, and
  • upgrade modules that are out of date.

Simple enough. But it looks like having PPM just do that by feeding it that list, simply isn't in the list of supported features.

So I ended up installing most of these modules by hand, list in hand. Well, I tried. It turned out some of the modules were still not properly installed. For example, Crypt::SSLeay was missing its DLL, and Win32::API just didn't work.

So now, days later, I'm still stuck with an incomplete set of reinstalled, and possibly broken, modules. I now just have to install additional modules when I find some script is broken. Oh, joy.

And then, there are still some modules (WWW::Mechanize and HTML::TokeParser::Simple) of which the API had changed, so, with freshly installed (and upgraded) modules, my scripts just didn't work any more. I've had to figure out what changed, and modify the script. Not fun.

I'm not looking forward to upgrading to 5.10.

p.s. I have some vague plans, if necessary, to write a shell script, controlling ppm through the command line, to install or upgrade the whole list.

Thursday January 24, 2008
06:52 PM

Badmouthing Perl

Today, when browsing through the popular sites of the day, I found several sites where the author found it necessary to sneer at Perl, where it wasn't even the subject of the post. And I wasn't even searching for it. Is this the new custom?
  1. GTK Hello World in Six Different Languages

    This is actually the least worrisome of the lot:

    Although I find writing Perl to be painful for everything but processing text files in a terminal, I found the Perl GTK bindings to be relatively straightforward.

  2. "If you don't know how compilers work, then you don't know how computers work"

    Steve Yegge writes:

    You discover that jsdoc is a miserable sod of a Perl script that seg faults on about 50% of your code base, and — bear with me here — you've vowed never to write another line of Perl, because, well, it's Perl. Pick your favorite reason.

  3. Can Dynamic Languages Scale?

    This is by far the worst of the lot, gratuitous Perl bashing:

    It's as Marx said, lo these many years ago: "From each language, according to its abilities, to each project, according to its needs."

    Oh, except Perl. Perl just sucks, period. :-)

These people don't appear to even know Perl, or at least, don't appear to know it well enough.

Just stop it, please. It's not funny.