Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Alias (5735)

  (email not shown publicly)

Journal of Alias (5735)

Monday September 01, 2008
10:54 PM

Imager::Search 1.00 - Image recognition at 5 frames a second

[ #37326 ]

A couple of years ago, I first came up with the idea of doing basic image recognition using the Perl regular expression engine.

Using Imager for the underying cross-platform image handling, I got a basic first version working by converting each pixel to a HTML colour code, building the image string and a search regex, and then converting the character match positions back into pixel terms.

And it worked great, until I tried to scale it up to the type of image you might actually want to look at the most, screen shots. At the time I needed to monitor OS X machines for any unexpected popups, because these were advertising displays, and that sort of thing is embarrasing.

Once I tried to apply the search technique to something the size of a 1024x768 screenshot, the overheads of transforming a million pixels in Perl started to bite hard.

While the regexp search might take 0.01 second, building the search image was taking something like 10 seconds. Just barely acceptable in my passive monitoring case, but not good for much else (like, say, writing a solitaire bot).

While on the flight home from YAPC::EU (via Iceland) I finally spent some time reorganising the search code into a driver API, and trying what I think is a much better approach.

If the most expensive part of the job is converting the search image, why not use a NATIVE image format for which fast C encoding already exists, build it in memory and then regexp it there. We shift more work into generating the search image, and a lot more work in working out where the hell the match is in pixel terms, but for hopefully a large net win.

And it turns out that 24-bit Windows BMP files make for a reasonably decent search image format. Of course there's the small problem of the scanlines going from bottom to top, and the fact the red/green/blue bytes are around the other way, and the small matter of each scanline being aligned in dword terms, resulting in between 0 and 3 useless bytes at the end of each line depending on the width of the line, but it turns out that with some funky post-match math , judicious use of quotemeta, and some post-processing of matches to remove false positives, you can generate a search expression that will quite comfortably search for an image inside of a native Windows BMP file.

Whereas the old time to capture and prepare a 1024x768 search image as #RRGGBB was around 10-12 seconds, the time to capture and prepare a 1024x768 native BMP file is in the vicinity of 150 milliseconds, which is almost two orders of magnitude faster.

Combining a 0.15 second capture cost with a 0.01 second regexp cost and another 0.01 to generate the search expression, the result is that using the new default "BMP24" driver, Imager::Search can monitor the desktop for images at around 5 frames a second!

This has the makings of a truly awesome Solitaire bot, or maybe even something like pinball...

This has got me thinking though, could I make it even faster again?

I'm now pondering the idea of a platform-specific Windows or X11 driver that would bypass Imager for the capture step altogether, and instead generate a search expression for whatever the raw byte string is that comes out of the screen capture system call...

The other major addition I need to add is to support transparency (1-bit transparency for now anyways) in the search image.

But for now, I'm happy enough with the new fast Imager::Search version to stamp it as 1.00 and release it.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • For a pet project* of mine, I needed to do some image processing/recognition from a video camera. First, I wrote the recognition algorithm with PDL, which turned out to be just a few, maybe five, lines of very dense code.

    It turned out to be very difficult to use the resulting, transformed image in the way I wanted to, because passing it from the video4linux driver (JPEG output) to perl/PDL (binary PNM is the keyword there) to SDLPerl took *a lot* longer than the actual processing.

    So I rewrote the whole thin

    • Given that I'm using a 1024x768 on 1.5Ghtz hardware, that suggests that the mechanism I'm using could just about handle your camera feed, albeit at around 100% of CPU. Granted I'm not displaying the results, but this is probably still only around an order of magnitude slower.

      Since the regex itself only costs less than 0.01 seconds, that just leaves the overhead of the capture call itself (which could itself contain a Windows internal transformation) and the non-trivial cost of packing the bytestream into Im

  • If we could create an SV which pointed at memory not managed by perl, we could create one which pointed at Imager's internal image representation, and match against that. But I don't know of a way to do that.

    Another option could be to use the getsamples() method to fetch each scanline of the image. This won't avoid the basic overhead of copying, but it will avoid the overhead of Imager's I/O abstraction.

    A final option could be XS that either used i_gsamp() or the image's internal representation to perform