Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • First, there's not really a "standard" for CSV. It really means whatever someone wants to throw at you. I had a project last year where multiple business partners would send me "CSV" data, and no two were the same. Some quoted every field. Some only quoted fields that needed it. Some escaped double quotes by doubling them. Some used backslashes. It was a mess.

    Second, don't use Text::CSV. Use Text::CSV_XS [cpan.org]. It's got far more parameters for your tuning enjoyment.

    --

    --
    xoa

    • I'm pretty sure Text::CSV_XS is the successor to Text::CSV. It's always a good idea to search CPAN [cpan.org] and look for more recent modules.

      For even more enjoyment, see if you can make use of DBD::CSV.

      --
      J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
      • DBD::CSV was my first choice, however the file we are being sent contains additional record types, which include 1 or more comment records (a ' as the first character) and 1 header record (a # as the first character).

        Plus it was easier to parse the file directly rather than store it locally, parse it, then delete it.

    • But it still has the wierd notion of not allowing us to use our alphabet unless we enter binary mode, which disables any check on characters.
      Usefull, but I do have a hard time explaining why you have to use binary mode to write non-binary data!

      I would love for it to have an eight bit mode, where control characters are forbidden, ie. 0x00-0x17, 0x7f-0x97 and 0xff (if I got my ranges right). Of course this would annoy M$-users, that have some printable characters embedded in the high control range (0x80-0x9f
      • This was the issue I had with it. Why should I have to switch to binary just to use the extended character set? The fix I did, apart from clean up the bizarre nesting and blank lines helping to confuse the layout of blocks, was the following chuck added to the _bite() function, just before the last "} else {" line:

        } elsif ($in_quotes) {
            # an extended character in quotes...
            $$piece_ref .= substr($$line_ref, 0 ,1);
            substr($$line_ref, 0, 1) = '';

        Well it does the job

    • Text::CSV_XS seemed a bit too much overkill for what I wanted. I have my own patch to Text::CSV now, which handles the extended character set, provided they are contained within quotes.

      Your example still follows the standard as I understand it. Fields can have quotes around them, or the quotes can be omitted if the field doesn't contain the quote character or the field separator. The standard way of escaping double quotes is to double them. Much like SQL in that respect.

      • Text::CSV is quite a bare module, which will be updated *very* soon now.

        The new Text::CSV will include a pure perl version of Text::CSV_XS and will itself be just a wrapper. If Text::CSV_XS is installed, it will use it, otherwise, it will used the bundled Text::CSV_PP (or Text::CSV_PurePerl as the snap currently states).

        Text::CSV_XS is extremely faster than the pure-perl version(s).

        See also http://www.perlmonks.org/?node=617577 [perlmonks.org]
        --
        Enjoy, have FUN! H.Merijn