Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

gnat (29)

gnat
  (email not shown publicly)

Journal of gnat (29)

Thursday May 22, 2003
06:32 PM

The Power of Perl

[ #12388 ]
Last night I played with three databases. O'Reilly subscribes to Bookscan, and we get a spreadsheet every week showing the last week's sell-through (actual sales in stores) and the year-to-date sell-through for umpty-zillion technical books.

Yes, a spreadsheet. Feh and bah to spreadsheets. I saved as CSV and used Text::CSV_XS to write a quick search routine (give me all the books with "XML", "Schema", or "XSLT" in the title; give me all the O'Reilly books; etc.). That was database 1.

Then I wrote a quick screenscraper for the O'Reilly intranet sales page, to fetch the year-to-date sales from the web interface to our data. That was database 2.

Then I broke out my command-line client for the XML-RPC interface to our Amazon ranks database. Database 3.

I found out that Amazon is far more geeky than the Barnes and Noble-type stores that Bookscan reports on. AOL for Dummies sells really big through real stores, phenomenally poorly on Amazon. (Not surprising, really--once you're on the internet, you probably don't have much use for AOL for Dummies).

Then I compared our sales to bookscan sales to see whether it was possible to make any sense of the two. It wasn't. In some cases, sell-through was more than 100% of the orders our warehouse received. All that means is that the some of the chains had stock, sold it, and haven't reordered from us. Probably because they had too much stock to begin with.

All this from Perl. I'd have been boned trying to do it in Excel or any (spit) application. Proof that every company's financial department should have at least one Perl programmer.

Interestingly, the HTML tables returned by the intranet database interface were so awful that HTML::TableContentParser horked. I had to resort to regular expressions. Interestingly enough, the code became shorter when I did that, and easier to read. How odd ...

Another dead-end was using DBD::CSV. Because I wanted to full-text search the record, regardless of field, constructing SQL queries was too painful and slow. And I couldn't get RLIKE to behave. So Text::CSV_XS was perfect.

--Nat
(all this took about two hours, half of which was spent debugging the HTML scraping and bumbling around with DBD::CSV)

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Right-O (Score:3, Insightful)

    by wickline (135) on 2003.05.23 8:02 (#20455) Journal
    > Proof that every company's financial department
    > should have at least one Perl programmer.

    and so should every other department.

    I bet that if I were to sit behind a random employee for a day, I'd have very good odds of finding something they're doing that could be made more efficient with a perl script or few.

    If I didn't already have plenty of stuff to do at work, I'd actually like to do that sort of shoulder surfing. Too few folks know that we're available to make their lives easier. Of course, if more folks know, they'd have to hire more of us, but I think the gain in productivity would be well worth it.

    Every department everywhere needs a staff of roving perl programmers on the look out for tasks to simplify and automate :)

    -matt
    • Yes, but then you risk putting the guy who gets the specifications from the customer and brings them to the programmers out of work.</Office Space>

      It'd be an interesting freelance service, but then 90% of the demand would be on Wintendo and probably more tedium than fun.

  • You might also find Spreadsheet::ParseExcel [cpan.org] and Spreadsheet::ParseExcel::Simple [cpan.org] useful.

    John.
    --

  • You might want to peep through AnyData and DBD::AnyData sometime. :-)

    LOADS of fun can be had with these.
  • When I have to mess with CSV material, I almost always use Text::xSV and dump it into a SQLite database. DBD::SQLite is much more fully featured than DVD::CSV (IMNSHO), and I'm not above dropping out of perl to the db shell for a few statements to best determine where to focus my code.

    SQLite never fails to amaze me with the high degree of features and performance coupled with vastly easy use. Few "throw-away" projects would have me dump data into a "real" db, even one I have running on the system, but I