Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Journal of markjugg (792)

Thursday April 03, 2008
12:33 PM

Two alternate patches for rows-as-hashrefs in Text::CSV_XS

[ #36046 ]

H.Merijn Brand, the Text::CSV_XS maintainer has been dicussing possibilities for adding parsing rows as hashrefs to that module through this RT ticket.

As fate would have it, our efforts to implement it crossed paths, and we now both have fairly complete but somewhat different patches for the feature. A couple points to get feedback on:

Which you do find clearer for setting the column names to use as the hash keys:


column_names()
or
hr_keys()

I have already been confused about whether "hr" stood for "header row" or "hashref", so I vote for the former.

The second point, which is currently in neither patch, is "how you design the interface to automatically setting the column names from the first row of the CSV?"

Parse::CSV uses new( fields => 'auto' ), but involving new() won't work for Text::CSV_XS.

I was thinking of perhaps:


$csv->column_names_from_line($io);

Which would simply mean:


$csv->column_names( $csv->getline($io) );

We would leave it up to documentation to make sure users called this first thing.

Alternately, you could have a function that stores the current file position, rewinds and reads the first row, and then returns to the current position. That seems more fragile to me, and I can oly imagine there are some non-rewindable filehandles out there for which it wouldn't work.

You can leave feedback here and/or in the RT ticket.

Thanks!

      Mark

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
    1. No contest: column_names.

    2. Definitely no rewinding magic. In fact, I wouldn’t even include a column_names_from_line sugar method, because all it does is save five keystrokes on a rare operation for the price of concealing how that works. And that’s the only reason users might forget about having to call that method first thing.

      If you just tell them that they need to set up the column names manually (and here’s an easy way to take them from the first line of the file), then it’s p

    • Thanks for the feedback. I'll lean towards not proposing the sugar method for the reasons you give.
  • I agree that column_names is a much better than the other.

  • But have you seen the interface that Text::xSV [cpan.org] uses?

      use Text::xSV;
      my $csv = new Text::xSV;
      $csv->open_file("foo.csv");
      $csv->read_header();

      while (my $data = $csv->fetchrow_hash) {
        # do stuff...
      }

    Personally I quite like the read_header() function.

  • OK, column_names () it is.

    I liked the idea of hr_** for its double meaning: Hash Ref and Header Row, but that might be professional brain deformation from my side.

    While I was designing this, I also had DBI in mind, and the obvious next step to try is bind_columns ().

    With the new column_names (), it would be nice to do a DBI like bind_keys () so fields are stored in the same scalar over and over again, instead of creating a new scalar on parsing for every field line after line again.

    This *could* mean

    --
    Enjoy, have FUN! H.Merijn
  • This is offered as an example of how I did something similarly. In Tie::Handle::CSV [cpan.org] I just overloaded 'header' in the constructor. It can be a simple boolean to indicate whether the file has a header, or an array ref to assign the header.
  • You've now got it, and I also give you bind_columns!

    I value feedback, and probably some improvements on the docs, like adding the new stuff to the SYNOPSIS

      file: $CPAN/authors/id/H/HM/HMBRAND/Text-CSV_XS-0.40.tgz
      size: 85057 bytes
       md5: cb8b2af20925b832159f34eed9793666

    2008-04-07  0.40 - H.Merijn Brand   <h.m.brand@xs4all.nl>

            * Implemented getline_hr () and column_names () RT 34474
              (suggestions accep

    --
    Enjoy, have FUN! H.Merijn