Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

samtregar (2699)

samtregar
  (email not shown publicly)
http://sam.tregar.com/

Journal of samtregar (2699)

Sunday October 29, 2006
05:44 PM

Mac line-endings and Text::CSV_XS

[ #31443 ]

To say that Text::CSV_XS has trouble with Mac line-endings (\015) is somewhat of an understatement. Not only will it not parse a file that uses them to end lines, it won't even allow them inside a field in binary-mode. Binary-mode is advertised as allowing any character as long as it's in a quoted string, so this is clearly (in my opinion) a bug.

I dug into the code intending to solve both problems, but only managed to fix the latter. Actually supporting \015 as a line-ending character looks like it would be hard. For my purposes it wouldn't help unless it was automatic - if I have to tell Text::CSV_XS that a file has Mac line-endings then I might as well just translate them. That's the way Unix and Windows line-endings work now - you don't have to tell the module what to expect and you can even mix them in a single file. The way it accomplishes this feat doesn't extend well though, at least as far as I can tell.

In any case, here's the bug fix: mac.diff. After you apply it you should find that stray \015 characters work just fine in binary mode. I also sent it to the maintainer, but since the module hasn't had a release in 5 years I'm not exactly holding my breath!

I came very close to going on an optimization mission while I was in the code. The state machine looks like it could benefit from some tweaking and the way lines are read looks like it could be improved. This would be pretty foolish though - Text::CSV_XS is already so fast that I've never seen it show up in a profile on a serious app. Usually I'm reading CSVs so I can load data into a database via DBI, by which point Text::CSV_XS is unlikely to be a bottleneck.

-sam

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • I thought I heard Jeff Zucker [cpan.org], AKA jZed [perlmonks.org] on Perlmonks, was set on taking over this module but I think "he didn't find it worthwhile already to release a new version". The last is paraphrased from something he told me in the Chatterbox. I'm not sure any more that this was the module he was talking about, but I think it was.

    So... Ask him?
  • If Jeff doesn't want to take over the module, I can apply the patch and release a new version. I don't want to maintain it, but I do help modules find new homes. :)
    • So noted. I think he's in - it's listed on his CPAN author page as a module registered to him. I haven't heard back from him yet, but I imagine I will soon.

      -sam

  • Hi Sam,

    I know I am late to the party, but try installing PerlIO::eol and give the following a try:

    my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
    open my $io, "<:raw:eol(Native)", $filename or die "$filename: $!";
    while (my $row = $csv->getline($io)) {
      ...
    }

    Text::CSV_XS can use an IO::Handle object, and the IO::Handle object can convert line endings on the fly for you using PerlIO::eol. This also has the added benefit of handling fields with embedded newlines.