Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • I've used antiword [demon.nl] in the past for reading MS Word docs, but I don't know how well it reads tables. You might want to give it a try.

    • Thank you! It looks like antiword converts to XML and/or DocBook, so maybe I can go that route. It says the support is still experimental, but I'll check it out. Even if it doesn't work today, it may work at some point in the future.

      --
      J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
    • Awesome!!! This is entirely feasible! Thank you!

      The tables come out into elements called <informaltable>. I can parse that XML, extract those, and convert them. In fact it looks like this is better than going to Excel because going to Excel provides several "phantom" blank cells which I have to ignore in my current program.

      I'm not sure if I'm going to have to do this specific file again, but there's a good chance I might, and if I do I will attempt to program this process. If I don't for this f

      --
      J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
  • If your on Win32 Win32::OLE [cpan.org] could help you.
    • Thanks for the pointer. Maybe I can do this entirely in pure Perl, and drop any intermediate file formats. :)

      --
      J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
    • maybe you could save your doc in XML or HTML, and then parse the result with your favorite XSLT or regex tool. Something along the lines :

      use Win32::OLE;

      sub wdFormatHTML {8}
      sub wdFormatXML {11}

      my $msword = Win32::OLE->new("Word.Application");
      my $doc = $msword->Documents->Open($src_name);
      $doc->SaveAs($target_name, wdFormatXML);
      • Thank you for the concrete example. That looks like it may work very well.

        --
        J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers