Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

TorgoX (1933)

TorgoX
  sburkeNO@SPAMcpan.org
http://search.cpan.org/~sburke/

"Il est beau comme la retractilité des serres des oiseaux rapaces [...] et surtout, comme la rencontre fortuite sur une table de dissection d'une machine à coudre et d'un parapluie !" -- Lautréamont

Journal of TorgoX (1933)

Sunday August 17, 2003
06:04 AM

PDF

[ #14165 ]
Dear Log,

I've been looking at PDF as a format lately. Whereas I could sort of wrap my mind around PostScript, and had no trouble with RTF, PDF is not so friendly.

PDF is an odd format, internally. It's a strange blend of simplicity, human-readability, optimization, and inscrutibility. For example, it's very like some kind of simple data-dumping format where you just have:

defineobject "Blorch" => {
    key1 => val1,
    key2 => val2,
    key3 => val3,
   ...
}

...over and over, with the understanding that there's some special semantics imputed to magically-named objects and keys to get the ball rolling.

Except that PDF is only sort of like that -- the object IDs aren't alphanumeric symbol names, they're integers, and there's all sorts of byte-counts and byte-offsets being tossed around. It's as if an XML file ended with a lookup table of all the root's children's byte-offsets. PDF is complicated enough that a simple "Hello World" document is several hundred bytes, because of all the per-document overhead.

Here, this explains it pretty well.

PDF is not a crazy format by any means, although there were some odd decisions made. For example, in the original spec, all the document-indexing data is at the very end of the file, whereas clearly if you were thinking about streaming a file to web clients, you'd probably put it at the beginning instead. (What they were thinking of was not that, but the problem of "fast saves" -- changing a disk file by just adding new objects to it and appending a new cross-references table.) Of course, in time, they did think of streaming, so for that and other reasons, the PDF spec has become a massive massive document.

Suddenly the MIDI File Format sounds friendly in comparison.

Long story short: Thank God there's CPAN modules for dealing with PDFs!

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Yes, PDF is a bear. I believe most of the Perl modules rely on a C library (whose name escapes me). Oddly enough, PHP comes bundled with support for PDF writing and it uses that same C-lib. I believe you can embedded fonts in PDF, which partially addresses my long-standing beef with typesetting in Unix (it hasn't changed in th 20 years).