Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

malte (1708)

malte
  (email not shown publicly)
http://joose-js.blogspot.com/

Working on Joose JavaScript meta system (Blog) [blogspot.com] and blok [appspot.com], a web based application for collaborative ui prototyping.

Journal of malte (1708)

Friday April 25, 2003
02:40 AM

Screenscraping table-lookalike

[ #11842 ]

We are implementing cross media publishing for one of our customers. They have a rather large catalogue which is already available online on a database driven website; however, the website does not have all data on it. Each product has some very technical table associated with it that are only available as PDF or in the offline catalogue.

The plan is to generate all catalogues from the same database. That means we need all the data that is currently in the offline catalogue in the database. The problem is that this data only exist as Quark XPress documents. Every number in the table has it's own layer and is carefully placed by hand. So, there is absolutely no structure in the documents.

Does anybody here have an idea how you could maybe have a program look at a page and see what looks like a table and then output that data in some structured form?

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.