Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Journal of LTjake (4001)

Friday February 22, 2008
10:23 AM

Automatic thumbnails of web sites

[ #35727 ]

At $work, we decided that we kind of liked the idea of showing thumbnails of the sites we've linked in a particular section of our site. Naturally, there are existing services that can be used to do this. However, we weren't too hip on relying on those services.

We decided to try and make our own thumbnail service. We took an old machine, put Ubuntu (7.10) on it, and created a fairly simple script to control Firefox and generate the screenshots. It uses X11::GUITest to do the automation and Imager::Screenshot to process the screen:

use strict;
use warnings;

use Imager::Screenshot ();
use X11::GUITest       ();
use Digest::MD5        ();
use CGI                ();

my $start   = '~/blank.html';
my @urls    = load_config( shift );
my $destdir = shift;
my $id      = close_and_reopen_firefox();

my $count = 0;
for my $url ( map { CGI->unescapeHTML( $_ ) } @urls ) {
    # skip "comments"
    next if $url =~ m{^#};
    # skip existing screenshots
    next if -e gen_filename( $url );

    load_page( $url );
    sleep( 20 );
    take_screenshot( $id, $url );

    # reload firefox after 100 urls
    if ( ++$count == 100 ) {
        $id    = close_and_reopen_firefox( $id );
        $count = 0;
    }
}

# close firefox
X11::GUITest::SendKeys( "%({F4})" );

sub gen_filename {
    my $url = shift;
    return "${destdir}/" . Digest::MD5::md5_hex( $url ) . '.jpg';
}

sub take_screenshot {
    my $id  = shift;
    my $url = shift;
    my $i   = Imager::Screenshot::screenshot( id => $id );

    # remove the scrollbar + scale
    $i = $i->crop( right => $i->getwidth - 15 );
    $i = $i->scale( xpixels => 150 );
    $i->write( file => gen_filename( $url ), jpegquality => 75 );
}

sub load_page {
    my $url = shift;
    X11::GUITest::SendKeys( '%({LEF})' );    # go "back"
    X11::GUITest::SendKeys( "^(l)" );
    X11::GUITest::WaitWindowViewable( 'Open Web Location' );
    X11::GUITest::SendKeys( "${url}{ENT}" );
}

sub close_and_reopen_firefox {
    my $id = shift;
    if ( $id ) {
        X11::GUITest::SetInputFocus( $id );
        X11::GUITest::SendKeys( "%({F4})" );
    }

    X11::GUITest::StartApp( "firefox $start" );
    ( $id ) = X11::GUITest::WaitWindowViewable( 'Mozilla Firefox' );
    sleep( 2 );
    X11::GUITest::SetInputFocus( $id );
    X11::GUITest::SendKeys( "{F11}" );

    return $id;
}

sub load_config {
    my $file = shift;

    open( my $data, $file );
    my @urls = split(
        "\n",
        do { local $/; <$data>; }
    );
    close( $data );

    return grep { length } @urls;
}

Now, you might have noticed that we close firefox after 100 urls -- in reality, it never gets that far. Things seem to segfault around 25 urls in. I don't particularly understand why it's so unstable. We've disabled the "session recovery" feature so firefox won't get stuck asking questions on startup, plus fast back<->forward history rendering in case it was leaking memory.

Hopefully someone will find this bit of code useful, and perhaps someone has some ideas as to how we can make this setup a little more stable.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.