Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

davorg (18)

Yahoo! ID: daveorguk (Add User, Send Message)

Hacker, author, trainer

Technorati Profile []

Journal of davorg (18)

Wednesday August 18, 2004
03:19 AM

Document Conversion

[ #20448 ]

This is just thinking aloud about a discussion that came up in conversation in the office yesterday. I doubt I'll have any time to do anything about it at work so I might play with a few ideas in my spare (?) time.

We're geeks. Specifically most of us are Unix geeks (even the iGeeks are Unix geeks now). And Unix geeks like text files. We don't use a word processor unless someone is holding a gun to our head. We're far happier using our favourite text editor to create POD or DocBook or something like that. I'm sure I don't need to explain the advantages of text files over proprietary binary formats.

But we need to interact with the rest of the company. And the rest of the company like Word documents. The very idea of reading a plain text document fills them with the deepest dread.

That's not a problem. We can create a document in POD, use Pod::DocBook to convert it to DocBook and then use one of the db2foo tools to convert it into something they can read in Word (probably RTF I guess).

But it's not quite that simple. The non-techs like their Word docs to have a certain look. They create templates for different types of document that define fonts, header styles, required sections, watermarked logos and things like that. And my auto-generated RTF file won't have all of that.

Until now I've got round that by creating the RTF file, opening it in OpenOffice and applying the formatting from another document that was created using the template. But this is a soul-destroying (not to mention error-prone) activity.

So what I'm thinking that we need (well, maybe "need" is overstating the case a bit) an application that somehow parses a Word template file, extracts all of the formatting information and builds a file (it's probably DSSSL) that db2rtf can use to create an RTF file containing all the correct formats. Or maybe it just creates an XSLT file that transforms DocBook to the MS Word XML format. Or something like that.

But like I said, I'm just thinking aloud here. Does anyone know if anything like this already exists? Or have any idea on where I might start? Does MS publish the format of its .dot files anywhere?

Or would I be completely wasting my time?

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • If you can convert the template to an OO.o template then it's just an XML file (and a fairly well laid out one too - I wrote an article on about it ages ago). From there it's cake.
  • I've done a lot of work generating RTF with DSSSL. This is not a path you want to follow. The output is generally good enough for many uses, but does not reach into the more complex formatting you are looking to achieve.

    First of all, if you're talking DSSSL, you're talking OpenJade, which is a large package, somewhat difficult to build, and very difficult to troubleshoot unless you have some background in SGML arcana, or wish to acquire it. It's like expending the effort to master Victorian era metall

  • Word (recent incarnations of it, at any rate) do HTML fairly well. They certainly have no problem opening HTML documents. Would that fit ? You could do headers, footers, watermarks etc etc with HTML. Heck, just create a blank document, export to HTML, run HTMLTidy on it and you have your own HTML template to fill in with text.

    I was also going to suggest PDF, because a few decent toolchains exist for generating fancy PDF; but editing it would pose a few problems (unless everyone has Acrobat installed or som