Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Friday November 14, 2003
04:32 PM

TPC presentations

[ #15781 ]

Ziggy sent me 300+ MB of O'ReillyCon presentations, so I have a lot of reading to do.

I tried to download some others, but a lot of them had converted to HTML and wanted me to look at a lot of pages. I cannot really do that here. I download a lot of stuff, then look at it offline. Bah, humbug!

Somewhere I have a powerpoint to PDF converter that I wrote at the first MacOSXCon. The PDF turned out to be huge since every page had to reproduce the background image. There is probably some PostScripty way around that, though.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • That would spider a site, download all the pages and change the links to your on-disk copy. Those used to be really commonly used when people used slow 36K MODEM links on the web, but I haven't used one in years and I can't recall the names of any such programs. My naive google searching for something like this hasn't immediately turned up anything. Anybody know of a good one? Seems like a fairly easy perl program to write, actually.

    Also, I used to use Plucker [plkr.org] for Palm devices, which does this, but do

    • by ziggy (25) on 2003.11.14 18:11 (#25813) Journal
      Seems like a fairly easy perl program to write, actually.
      Actually, it's harder to write than you'd think. There are lots of edge cases to handle: not only do you need to fetch images and munge the <img ...> tags, but all of the frames, iframes, CSS stylesheets, media files and so on. Oh, and don't forget to rationalize all of the URLs. LWP can convert relative to absolute URLs for you, but you still need to find either and replace them with something relative on your filesystem.

      I tried it a few times. It's not the 30-minute hack it appears to be. You're probably better off using GNU wget [gnu.org] instead of rewriting it from scratch in Perl.

      • Hey, cool, I didn't know wget would convert links for you. Thanks!

        Actually, I can think of a good use for this. There're a lot of web-based docs that I've used that don't come in the form of one big HTML file. It's sometimes slow to browse these from work. You get the idea...

        Hmmmm... Can I maintain a bunch of pages on my local hard drive compressed such that they will uncompress when I access them from a browser? I could run Apache on my desktop, sure, but how do I build something that would support

        • With a little (mod-perl) URL translation and content filtering, you could easily read some compressed files and present them uncompressed. I think there even use to be a browser that could gunzip the right sort of response.
      • Well, getting 80% of the problem done is about 30 minutes, give or take a day. Since I tend to stay away from sites with goofy features, that 80% of the problem is 99% of my experience, so I don't have to do the other 20% of the problem just to get the 1% benefit.

        In fact, I had to do this to convert some IE web archives so I can read them.
    • I have such programs on my computer, but I don't always get to use my computer.

      On MacOS X I use either SiteSucker or WebDevil.

      Last night, however, I was stuck on an Army computer.
  • PDF Conversions (Score:3, Informative)

    by ziggy (25) on 2003.11.14 18:06 (#25812) Journal
    Hey, I already used a PowerPoint to PDF converter on those slides. It's called Keynote. ;-)

    Keynote 1.0 had that same problem. The 1.1 update improved the PDF output somewhat, but it's still a little weighty. I think they managed to include the background image once (vs. once per slide), but that's just guesswork based on how the filesize went down 5x-10x.

    The PDF files that Keynote generates are still pretty big. I'm just guessing here, but Keynote may be using TIFF for all of its images, instead of something more svelte, like PNG or JPEG.