Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

Ovid
  (email not shown publicly)
http://publius-ovidius.livejournal.com/
AOL IM: ovidperl (Add Buddy, Send Message)

Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Thursday December 24, 2009
06:29 AM

Improve My Perl 6!

Friday December 18, 2009
09:15 AM

Gitpan languages

What? You didn't know that SOAP::Lite was written in Visual Basic?

Thursday December 17, 2009
05:40 AM

Atom Feed Help

Monday December 14, 2009
07:56 AM

MySQL and Oracle

MySQL and Oracle. (Despite teething pains, blogs.perl.org is holding up quite well on the new server)

Saturday December 12, 2009
07:24 AM

What's In A Name?

The actual entry is at blogs.perl.org. However, here's a few extra notes about that site:

It's much more stable than last time. Aaron moved it to a new server and Dave set up MT. The templates are on github along with an issue tracker.

Barbie talked with Dave and myself at the London Perl Workshop (lpw2009) about the site and I wrote down all of his suggestions and they've now been added to the issue list. We look forward to more people trying it out and Dave has a post asking for more feedback. Also, he's aware that several people only made it partway through the registration process. Drop him a line if you're affected and he'll clear it up for you.

I can't thank Dave, Aaron and SixApart for all they're doing to create a modern blogging platform for us.

Tuesday December 08, 2009
09:02 AM

Regex Captures in Debugger

Stumbled across this weird behavior today. Took a while to debug it. In the debugger, I'm not seeing the "dollar digit" regex capture variables set, even though the regex matches.

$ perl -de 1

Loading DB routines from perl5db.pl version 1.3
Editor support available.

Enter h or `h h' for help, or `man perldebug' for more help.

main::(-e:1):   1
  DB<1> ($foo) = ('abcd' =~ /(bc)/)

  DB<2> x $1
0  undef
  DB<3> x $foo
0  'bc'

Is this documented? I can't find it.

Update: Rafael explained it. The digit variables are lexically scoped. That's why you can assign to a package variable in a debugger, but not a lexical.

  DB<6>  'abcd' =~ /(bc)/ && print $1
bc

Monday December 07, 2009
10:38 AM

The Implications of the Bug

mscolly correctly identified the SQL bug I posted. Sadly, no one discussed the implications of this bug and I think they're the most interesting part of this. Essentially, it comes down to the following:

SELECT    first_name,
          last_name,
          order_date,
          SUM(price) AS total                  -- what if there's no price?
FROM      customer
LEFT JOIN orders     ON customer.id = orders.customer_id
LEFT JOIN order_item ON orders.id   = order_item.order_id
GROUP BY  first_name, last_name, order_date
HAVING    total < 15                           --  what does NULL < 15 evalute to?
ORDER     BY order_date ASC;

The proper solution (as mscolly pointed out) is to change the "SUM" line to this:

COALESCE( SUM(price), 0 ) AS total

The English language, as we know, is ambiguous. If you boss had come in and asked for all customers whose orders (more accurately, whose orders with order items) totaled less than £15 pounds, then the above query would have actually been correct, but another programmer coming along to maintain it could be forgiven for thinking it's in error. If you ever write SQL which is likely to produce NULLs (e.g., outer joins), you should explicitly handle that case if you actually do anything with those NULLs.

But in this case, "customers whose orders total less than £15" is significantly different in meaning than "customers who spent less than £15" and the latter is what we want, but the former is what we have. While the above code seems logical, it gives a logically incorrect answer because it omits customers without orders (or order items), even though they're clearly intended. However, NULLs make it very difficult to identify what you actually mean because the database can't know why something is NULL.

Now consider a simpler, yet silly, example:

SELECT first_name
FROM   employee
WHERE  salary > 50000;

What happens if the salary field is NULL? You'll get a list of employees whose known salary is NOT NULL. Why might they not have a salary? Maybe they're an hourly employee and the salary field is not applicable. Maybe they're the CEO and he doesn't think you need to know his salary. Maybe they're an ex-employee and they have no salary.

Taking this a bit further, imagine that all employees in the table are current and all have salaries (no hourly workers), but the salary field is still sometimes NULL because the board of directors doesn't want you to know their salaries. With me so far? In this scenario, it is the case that everyone has a salary; you just don't know what some of them are. So here's the kicker:

SELECT first_name
FROM   employee
WHERE  salary = salary;

That won't return anyone on the board of directors, even though you know they have a salary. Furthermore, most would think it's self-evident that p = p, but in three value logic of databases, this is sometimes true and sometimes false. Heck, because of this, the following does not always evaluate correctly, even though we would think it does:

SELECT service_id,
    CASE WHEN master_brand_id =  master_brand_id THEN '='
         WHEN master_brand_id != master_brand_id THEN '!='
    END AS 'comparison'
FROM service

Sure, you say, but you're comparing something to itself. You don't do that in the real world. No? So look at this:

SELECT s.service_id,
    CASE WHEN m.master_brand_id =  s.master_brand_id THEN '='
         WHEN m.master_brand_id != s.master_brand_id THEN '!='
    END AS 'comparison'
FROM service      s
     master_brand m

If the s.master_brand_id is allowed to be NULL, than the comparison field will always have a NULL value when s.master_brand_id is NULL. It's easy to debug in this simple example, but what if that was a subquery? It looks fine, but it all breaks down in the presence of NULL values.

I didn't start with that example because people would say it's silly, but starting with the "order" example shows how NULLs in databases can return logically incorrect data and the reduction down to the simple p = p case not holding shows why this happens.

At this point, I can see people saying "yeah, but we already know that about databases." And this is true. It's well-known that certain types of queries can generate NULLs even though there are no NULL values in the database. Regrettably, many people assume the database logic is, well, logical. The p = p failure is a strong rebuttal, but I suppose some people assume that hitting themselves in the head with a hammer is normal.

If you really want to have some fun, read this blog entry about NULL values. In the comments, the author even explains how to deal with NULLs in outer joins, but it requires a relational database (very few databases really are) and that people understand what first normal form is really about. (If you think you know, please define "atomic values" in the comments below).

I wonder how database design would look today if, instead of 3VL, databases threw an exception when you tried to apply an operator or aggregation ('=', '+', 'SUM', etc.) to NULL values?

Note: I've discussed the problem with NULL values before, but in realizing I had a better real-world example, I thought it would make more sense to readers.

02:43 AM

Find the bug (sql)

Update: You can ignore the order_date below. It's a red herring and I probably should have left it out, but I had liked the fact that by putting it in the query, I added more complexity, thus making the real bug more difficult to spot.

Assume you're a diligent programmer. You've designed your database carefully. Foreign constraints are correct, you have no null columns and you've kept a nice, simple design. Now your boss wants you to provide a list of all customers who've spent less than £15 on your Web site because you want to offer them a special promotion. Here's the SQL you've written:

SELECT    first_name,
          last_name,
          order_date,
          SUM(price) AS total
FROM      customer
LEFT JOIN orders     ON customer.id = orders.customer_id
LEFT JOIN order_item ON orders.id   = order_item.order_id
GROUP BY  first_name, last_name, order_date
HAVING    total < 15
ORDER     BY order_date ASC;

The two left joins are there because a customer may never have placed an order before. Heck, they may have started an order but not added any items to it. The 'having' statement is required because you generally can't use aggregates in where clauses.

You run the SQL and hand-check the results very carefully. Choosing a random sampling of customers returned, you verify that none of them have spent more than £15 on your site. Nonetheless, you have a bug. What is it? What are the implications of the bug?

Friday December 04, 2009
08:32 AM

Pitfalls in Converting Base Classes to Roles

I should have responded to this a while ago, but didn't. Basically, nothingmuch has a blog post responding to some of my comments about roles and it's worth reading. However, there's one problematic bit:

If you take a working multiple inheritance based design and change every base class into a role, it will still work. Roles will produce errors for ambiguities, but if the design makes sense there shouldn't be many of those to begin with. The fundamental structure of the code hasn't actually changed with the migration to roles.

For the majority of code bases I see, this is doubtless true. However, I tend to get jobs where I work on large code bases. For example, just looking at our lib/ directory:

$ find lib/ -name '*.pm' | wc -l
652

That's not huge, but it's not particularly small, either. And it doesn't count our various scripts, tests, schemas, etc. So let's say I want to add a boolean attribute to an object and said attribute must be persisted in the database and must be readable/writeable via our API. That's a single attribute. Just one. I need to:

  • Write the database migration level
  • Update our RelaxNG schemas
  • Update our data extractors (extracts data from XML docs)
  • Add the column to our domain objects
  • Add the column to our resultsource classes
  • Possibly add the column to resultset classes
  • Update our XML builders
  • Write tests
  • And possibly more, depending on why we're adding the attribute

That's a lot of work, but adding a single boolean attribute can take me an hour or two if I'm not interrupted. And, believe it or not, this is easier than it used to be. And, believe it or not, much of that isn't the silly grunt work that it appears to be because there's a lot of stuff that is hard to automate away safely (my unreleased Bermuda project was an attempt to do this). Some can and we've managed some of it, but not all.

In short, we have a complicated code base that has decades of programmer hours in it. We used to rely a lot on multiple inheritance, despite the well-known issues with MI. Now let's revisit the key sentence in nothingmuch's post: "If you take a working multiple inheritance based design and change every base class into a role, it will still work."

To nothingmuch's credit, he does point out that ambiguities will cause errors, but there's a lot more to it than that. As projects grow organically, things often get put into incorrect spots. We had all sorts of code shoved into base classes that our derived classes shouldn't inherit from, but did. Since tests tend to be written against expectations, it's easy to not write tests for things you don't expect.

The reason classes sometimes have inappropriate behavior is because abusing inheritance tends to mean that you're having classes manage both responsibilities and cross-cutting concerns. I've found that base classes often had to be broken up into two or more roles (nothingmuch has a great example of Test::Builder doing the same thing and that's a small code base). That can get really annoying when you find that each role requires the same private method and you're trying to figure out where to put it. As you juggle things like this, behaviors will change and you had better have a good test suite to catch them.

There's another, more subtle problem. Imagine that you have the following (it gets much worse when MI is involved):

AbstractParent
     ^
     |
AbstractChild
     ^
     |
ConcreteClass

In ConcreteClass we have this:

sub clean_the_kitchen {
    my $self = shift;
    $self->SUPER::clean_the_kitchen;
    $self->mop_the_floor;
    return $self;
}

What happens with that SUPER:: call? If it's only in AbstractParent, you're fine, but if it's also in AbstractChild, converting AbstractChild into a role will break your code. Further, it might break your code in very subtle ways that are hard to detect. This might seem uncommon, but we actually had this happen quite a bit when we were converting over to roles. This means that our concrete class methods needed to be written as:

after 'clean_the_kitchen' => sub {
    my $self = shift;
    $self->mop_the_floor;
    return $self;
};

That's easy to get wrong and, frankly, method modifiers such as before, after and around generally make me feel uncomfortable. Like role method aliasing and excluding, they're code smells suggesting your design needs work. However, as we were trying for "the simplest thing that could possibly work", we went with this with the intent of revisiting it later. We still haven't revisited it. We have work to do.

In short, converting bases classes to roles and resolving composition errors is often enough for small code bases, but as code bases get large and complex, it's much harder. There are plenty of pitfalls and they're not always obvious. Plus, despite having excellent code coverage for tests, you probably don't have great path coverage (few do). This means that switching from base classes to roles can easily introduce behavioral changes that you haven't been expecting.

See also Tips For Converting Base Classes Into Roles. (gosh, tags would be useful)

Tuesday December 01, 2009
09:27 AM

Why Should I Program in $Language?

For the developer evaluating programming languages the question of which languages to learn rarely involves "can this language do what I want it to do?" This is because virtually all programming languages (except ANSI SQL, if you consider it a language) are Turing Complete. Instead, what I see, over and over, is discussions over two things:

  • Can I get a job in this language?
  • Do I like this language?

So consider COBOL. Yes, I can get a job in that language, but do I like it? No. C is the same way for me. There are plenty of jobs out there, but I loathe how close it is to the metal because I like solving problems, not worrying about how to reallocate memory for a resized string. So if I were to assert that either COBOL or C is dead, that might be true ... for me. If I assert they're dead on the basis of job stats, I'd be sorely mistaken.

Naturally, when I hear the old "Perl is dead" saw, I know the job stats and I know they're not referring to them. However, what they invariably refer to is the "buzz" around Perl, measured by TIOBE, Google Trends and other tools. The buzz can be important, but at the end of the day, buzz won't matter if the tool is useful enough. Consider how much people complain about Perl's sigils. Now look at the following code:

$cssClass = "";
if ( $element->isOpen() ) {
    $cssClass = ' class="selected"';
}
if ( !$element->isHiddenInNavigation() ) {
    $filepath = $element->getId() . ".html";
    if ( substr( $element->getFilePath(), 0, 4 ) == 'http' ) {
        $filepath = $element->getFilePath();
    }
    $result .= '<li'
      . $cssClass
      . '><a href="'
      . $filepath . '">'
      . $element->getLabel() . '</a>'
      . $this->generate( $element->getChildren() ) . '</li>';
}

Assuming that has no errors, we know that it looks like Perl, but isn't (look closely if you don't see it). That's PHP code. So though people whine about sigils in Perl, you've got the same issue with PHP, but why has PHP crushed us in the Web space? Well, for one thing, people seem far happier with modphp than mod_perl. It's very easy to develop and deploy apps despite its inconsistencies and lacking many of the useful features that Perl 5 has.

A second issue, however, is the one which will continue to keep Perl 5 sneered at:

public function setHiddenInNavigation($hideInNavigation) {
    return $this->hideInNavigation = $hideInNavigation;
}

That's PHP, too. Until we can make Perl 5 code that clean and simple and in the core[1], we'll continue to be sneered at. It's embarrassing to read through PHP code and, once again, sigh at basic, basic things Perl 5 is missing (yes, PHP is missing many things too, but I'll bet most 9 to 5 programmers would take method signatures over closures any day of the week).

So really, programmers are asking if a language can do what they want it to do; at least on a syntax level. That's because syntax is a large part of the aesthetic appeal of a language.

So in answer to the question "Why Should I Program in $Language?", what the language can do (functionally, not syntax!) isn't really the issue. Nor is the "are there jobs?" because we know there are plenty of Perl jobs out there (not in all areas, though). The real question is "will I like it?" and from the numerous Perl hate posts, this is very much an open question but Perl largely has itself to blame.

1. We have a better object system than you do! Just install Moose (but be aware that it might have backwards-incompatible changes in a few months). Oh, to install it, you just need to configure CPAN. What? You don't have root? Here's how to configure CPAN to install in a local directory. And be aware that Module::Build and ExtUtil::MakeMaker take different arguments, and don't forget that ...